[jira] [Resolved] (IMPALA-8713) MathFunctions::Unhex() can overflow stack

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8713.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> MathFunctions::Unhex() can overflow stack
> -
>
> Key: IMPALA-8713
> URL: https://issues.apache.org/jira/browse/IMPALA-8713
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.3.0
>
>
> It allocates a stack buffer large enough to hold the output string. It 
> shouldn't be hard to see what could go wrong. E.g.
> {code}
> select unhex(repeat('z', 1024 * 1024 * 1024))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8713) MathFunctions::Unhex() can overflow stack

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8713.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> MathFunctions::Unhex() can overflow stack
> -
>
> Key: IMPALA-8713
> URL: https://issues.apache.org/jira/browse/IMPALA-8713
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.3.0
>
>
> It allocates a stack buffer large enough to hold the output string. It 
> shouldn't be hard to see what could go wrong. E.g.
> {code}
> select unhex(repeat('z', 1024 * 1024 * 1024))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-8713) MathFunctions::Unhex() can overflow stack

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874690#comment-16874690
 ] 

ASF subversion and git services commented on IMPALA-8713:
-

Commit c353cf7a648651244ac39677d0cb028e704281d0 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c353cf7 ]

IMPALA-8713: fix stack overflow in unhex()

Write the results into the output heap buffer
instead of into a temporary stack buffer.

No additional memory is used because
AnyValUtil::FromBuffer() allocated a temporary
buffer anyway.

Testing:
Added a targeted test to expr-test that caused
a crash before this fix.

Change-Id: Ie0c1760511a04c0823fc465cf6e529e9681b2488
Reviewed-on: http://gerrit.cloudera.org:8080/13743
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> MathFunctions::Unhex() can overflow stack
> -
>
> Key: IMPALA-8713
> URL: https://issues.apache.org/jira/browse/IMPALA-8713
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: crash
>
> It allocates a stack buffer large enough to hold the output string. It 
> shouldn't be hard to see what could go wrong. E.g.
> {code}
> select unhex(repeat('z', 1024 * 1024 * 1024))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8612) NPE when DropTableOrViewStmt analysis leaves serverName_ NULL

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874689#comment-16874689
 ] 

ASF subversion and git services commented on IMPALA-8612:
-

Commit e59e9261339a7a353c05548df84ff0d8793b768d in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e59e926 ]

IMPALA-8612: Fix sporadic NPE when dropping an authorized table

In the analyze() function of DropTableOrViewStmt it's possible that
serverName_ is not set when analyzer.getTable() throws. As a result
when the Catalog executes the drop table DDL it runs into a failing
Precondition check and throws a NullPointerException when updating
user privileges. Note, to run into the NPE it's required to have
authorization enabled.

Change-Id: I70bd7ca4796b24920ee156436bf8bbc682e7d952
Reviewed-on: http://gerrit.cloudera.org:8080/13508
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> NPE when DropTableOrViewStmt analysis leaves serverName_ NULL
> -
>
> Key: IMPALA-8612
> URL: https://issues.apache.org/jira/browse/IMPALA-8612
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 3.2.0
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Critical
>
> This line is skipped when analyzer.getTable() throws few lines above:
> https://github.com/apache/impala/blob/cd30949102425e28adadb51232653d910ac8422f/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java#L117
> As a result a drop table statement (with permissions ON) can throw 
> NullPointerException here:
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryCatalogdAuthorizationManager.java#L381



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8694) Impalad crash when use uda in sql

2019-06-27 Thread WangSheng (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874633#comment-16874633
 ] 

WangSheng commented on IMPALA-8694:
---

{code:java}
CREATE AGGREGATE FUNCTION avg_test(double)
RETURNS double
INTERMEDIATE string
LOCATION '/user/impala/udf/libudasample.so'
UPDATE_FN='_Z9AvgUpdatePN10impala_udf15FunctionContextERKNS_9DoubleValEPPh'
INIT_FN='_Z7AvgInitPN10impala_udf15FunctionContextEPPh'
MERGE_FN='_Z8AvgMergePN10impala_udf15FunctionContextERKPhPS2_'
FINALIZE_FN='_Z11AvgFinalizePN10impala_udf15FunctionContextERKPh';
{code}
create uda like this works, thanks Tim!

> Impalad crash when use uda in sql
> -
>
> Key: IMPALA-8694
> URL: https://issues.apache.org/jira/browse/IMPALA-8694
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: WangSheng
>Priority: Major
>
> Recently, I tested the uda avg()  in uda-sample.cc, and then impalad crash. 
> Here is the funciton created sql:
> {code:java}
> CREATE AGGREGATE FUNCTION avg_test(double)
> RETURNS double
> LOCATION '/user/impala/udf/libudasample.so'
> UPDATE_FN='_Z9AvgUpdatePN10impala_udf15FunctionContextERKNS_9DoubleValEPPh'
> INIT_FN='_Z7AvgInitPN10impala_udf15FunctionContextEPPh'
> MERGE_FN='_Z8AvgMergePN10impala_udf15FunctionContextERKPhPS2_'
> FINALIZE_FN='_Z11AvgFinalizePN10impala_udf15FunctionContextERKPh';
> {code}
> and here is the error info in hs_err_pid*.log:
> {code:java}
> Stack: [0x7f48b1c3a000,0x7f48b243b000], sp=0x7f48b24390e8, free 
> space=8188k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C [libc.so.6+0x8570e] memset+0xee
> C [impalad+0x1081e65] impala::AggFnEvaluator::Init(impala::Tuple*)+0x65
> C [impalad+0x1073efc] 
> impala::Aggregator::InitAggSlots(std::vector std::allocator > const&, impala::Tuple*)+0x9c
> C [impalad+0x106881d] 
> impala::NonGroupingAggregator::ConstructSingletonOutputTuple(std::vector  std::allocator > const&, impala::MemPool*)+0xad
> C [impalad+0x1068fa7] 
> impala::NonGroupingAggregator::Open(impala::RuntimeState*)+0x47
> C [impalad+0x1030f50] 
> impala::AggregationNode::Open(impala::RuntimeState*)+0x230
> C [impalad+0xb5e05e] impala::FragmentInstanceState::Open()+0x2ae
> C [impalad+0xb5fc2d] impala::FragmentInstanceState::Exec()+0x1cd
> C [impalad+0xb48002] 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*)+0x272
> C [impalad+0xd0a912] impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo 
> const*, impala::Promise*)+0x2f2
> C [impalad+0xd0b45a] boost::detail::thread_data (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value 
> >, boost::_bi::value, 
> boost::_bi::value*> > > 
> >::run()+0x7a
> {code}
> but, when I tested count() in uda-sample.cc, everything is ok, and here is 
> the count() uda created sql:
> {code:java}
> CREATE AGGREGATE FUNCTION count_test(bigint)
> RETURNS bigint
> LOCATION '/user/impala/udf/libudasample.so'
> UPDATE_FN='_Z11CountUpdatePN10impala_udf15FunctionContextERKNS_6IntValEPNS_9BigIntValE'
> INIT_FN='_Z9CountInitPN10impala_udf15FunctionContextEPNS_9BigIntValE'
> MERGE_FN='_Z10CountMergePN10impala_udf15FunctionContextERKNS_9BigIntValEPS2_'
> FINALIZE_FN='_Z13CountFinalizePN10impala_udf15FunctionContextERKNS_9BigIntValE';
> {code}
> This problem happened both on version 3.1.0 and 2.12.0, I'm not sure is this 
> a bug or my environment problem?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8577) Crash during OpenSSLSocket.read

2019-06-27 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874572#comment-16874572
 ] 

Sahil Takiar commented on IMPALA-8577:
--

I see a new wildfly openssl release: 
[https://github.com/wildfly/wildfly-openssl/releases/tag/1.0.7.Final] and looks 
like the AWS SDK has been updated. Should re-test with the upgraded versions.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8725) Improve usability when HMS is configured with strict managed tables

2019-06-27 Thread Anurag Mantripragada (JIRA)
Anurag Mantripragada created IMPALA-8725:


 Summary: Improve usability when HMS is configured with strict 
managed tables
 Key: IMPALA-8725
 URL: https://issues.apache.org/jira/browse/IMPALA-8725
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Anurag Mantripragada


Users tend to create and query managed tables often and when HMS is configured 
with strict managed tables they get: 
{code:java}
Table default.foo failed strict managed table checks due to the following 
reason: Table is marked as a managed table but is not transactional{code}
We should improve usability in these scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8725) Improve usability when HMS is configured with strict managed tables

2019-06-27 Thread Anurag Mantripragada (JIRA)
Anurag Mantripragada created IMPALA-8725:


 Summary: Improve usability when HMS is configured with strict 
managed tables
 Key: IMPALA-8725
 URL: https://issues.apache.org/jira/browse/IMPALA-8725
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Anurag Mantripragada


Users tend to create and query managed tables often and when HMS is configured 
with strict managed tables they get: 
{code:java}
Table default.foo failed strict managed table checks due to the following 
reason: Table is marked as a managed table but is not transactional{code}
We should improve usability in these scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8724) Don't run queries on unhealthy executor groups

2019-06-27 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-8724:
---

 Summary: Don't run queries on unhealthy executor groups
 Key: IMPALA-8724
 URL: https://issues.apache.org/jira/browse/IMPALA-8724
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Lars Volker
Assignee: Lars Volker


After IMPALA-8484 we need to add a way to exclude executor groups that are only 
partially available. This will help to keep queries from running on partially 
started groups and in cases where some nodes of an executor group have failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8724) Don't run queries on unhealthy executor groups

2019-06-27 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-8724:
---

 Summary: Don't run queries on unhealthy executor groups
 Key: IMPALA-8724
 URL: https://issues.apache.org/jira/browse/IMPALA-8724
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Lars Volker
Assignee: Lars Volker


After IMPALA-8484 we need to add a way to exclude executor groups that are only 
partially available. This will help to keep queries from running on partially 
started groups and in cases where some nodes of an executor group have failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (IMPALA-8427) Document the behavior change in IMPALA-7800

2019-06-27 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8427 started by Alex Rodoni.
---
> Document the behavior change in IMPALA-7800
> ---
>
> Key: IMPALA-8427
> URL: https://issues.apache.org/jira/browse/IMPALA-8427
> Project: IMPALA
>  Issue Type: Task
>  Components: Clients
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> IMPALA-7800 changes the default behavior of client connection timeout with 
> HS2 and Beeswax Thrift servers. Quote from the commit message:
> {noformat}
> The current implementation of the FE thrift server waits
> indefinitely to open the new session, if the maximum number of
> FE service threads specified by --fe_service_threads has been
> allocated.
> This patch introduces a startup flag to control how the server
> should treat new connection requests if we have run out of the
> configured number of server threads.
> If --accepted_client_cnxn_timeout > 0, new connection requests are
> rejected by the server if we can't get a server thread within
> the specified timeout.
> We set the default timeout to be 5 minutes. The old behavior
> can be restored by setting --accepted_client_cnxn_timeout=0,
> i.e., no timeout. The timeout applies only to client facing thrift
> servers, i.e., HS2 and Beeswax servers.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8723) Impala Doc: Upgrading to 3.3

2019-06-27 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8723:

Description: https://issues.apache.org/jira/browse/IMPALA-8427

> Impala Doc: Upgrading to 3.3
> 
>
> Key: IMPALA-8723
> URL: https://issues.apache.org/jira/browse/IMPALA-8723
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> https://issues.apache.org/jira/browse/IMPALA-8427



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8723) Impala Doc: Upgrading to 3.3

2019-06-27 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8723:
---

 Summary: Impala Doc: Upgrading to 3.3
 Key: IMPALA-8723
 URL: https://issues.apache.org/jira/browse/IMPALA-8723
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8723) Impala Doc: Upgrading to 3.3

2019-06-27 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8723:
---

 Summary: Impala Doc: Upgrading to 3.3
 Key: IMPALA-8723
 URL: https://issues.apache.org/jira/browse/IMPALA-8723
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8629) Adjust new KuduStorageHandler package

2019-06-27 Thread Hao Hao (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao resolved IMPALA-8629.
-
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Adjust new KuduStorageHandler package
> -
>
> Key: IMPALA-8629
> URL: https://issues.apache.org/jira/browse/IMPALA-8629
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Before releasing with the updated KuduStorageHandler, we should change the 
> new KuduStorageHandler package from “org.apache.kudu.hive” to 
> “org.apache.hadoop.hive.kudu”.
>  
> This should being done to ensure the stand-in storage handler can be a real 
> storage handler when a Hive integration is added in the future. The 
> “org.apache.hadoop.hive” package is the standard package all Hive storage 
> handlers lives under.
>  
> Additionally the stand-in format details defined 
> [here|https://github.com/apache/impala/blob/2bce974990e19788ec359deec50f06d44ec92048/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java#L70]
>  should be updated as well. Values for those entries should be:
> {noformat}
> org.apache.hadoop.hive.kudu.KuduInputFormat
> org.apache.hadoop.hive.kudu.KuduOutputFormat 
> org.apache.hadoop.hive.kudu.KuduSerDe 
> {noformat}
>  
> I have a WIP patch for HIVE-12971 and used that patch to validate that using 
> "correct" stand-in values would allow Hive to read HMS tables/entries created 
> by Impala. 
>  
> Note: This patch will need to be committed after the Kudu side patch is 
> committed and the Kudu build/version may need to be update in Impala. A 
> review for the Kudu side change is here: 
> [https://gerrit.cloudera.org/#/c/13540/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8502) Improve Kudu/Impala integration based on Kudu/HMS/Sentry integration

2019-06-27 Thread Hao Hao (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao resolved IMPALA-8502.
-
Resolution: Fixed

> Improve Kudu/Impala integration based on Kudu/HMS/Sentry integration
> 
>
> Key: IMPALA-8502
> URL: https://issues.apache.org/jira/browse/IMPALA-8502
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Hao Hao
>Assignee: Hao Hao
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> This is an umbrella JIRA to correspond to the [design 
> doc|https://docs.google.com/document/d/1HBEGEkrkHSd6h4qQ7O1nfySad9Hzyujf3_dYi5_M6eA/edit?usp=sharing]
>  that proposes to adapt Kudu/Impala integration accordingly with Kudu’s 
> integration with the HMS and Sentry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8629) Adjust new KuduStorageHandler package

2019-06-27 Thread Hao Hao (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao resolved IMPALA-8629.
-
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Adjust new KuduStorageHandler package
> -
>
> Key: IMPALA-8629
> URL: https://issues.apache.org/jira/browse/IMPALA-8629
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Before releasing with the updated KuduStorageHandler, we should change the 
> new KuduStorageHandler package from “org.apache.kudu.hive” to 
> “org.apache.hadoop.hive.kudu”.
>  
> This should being done to ensure the stand-in storage handler can be a real 
> storage handler when a Hive integration is added in the future. The 
> “org.apache.hadoop.hive” package is the standard package all Hive storage 
> handlers lives under.
>  
> Additionally the stand-in format details defined 
> [here|https://github.com/apache/impala/blob/2bce974990e19788ec359deec50f06d44ec92048/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java#L70]
>  should be updated as well. Values for those entries should be:
> {noformat}
> org.apache.hadoop.hive.kudu.KuduInputFormat
> org.apache.hadoop.hive.kudu.KuduOutputFormat 
> org.apache.hadoop.hive.kudu.KuduSerDe 
> {noformat}
>  
> I have a WIP patch for HIVE-12971 and used that patch to validate that using 
> "correct" stand-in values would allow Hive to read HMS tables/entries created 
> by Impala. 
>  
> Note: This patch will need to be committed after the Kudu side patch is 
> committed and the Kudu build/version may need to be update in Impala. A 
> review for the Kudu side change is here: 
> [https://gerrit.cloudera.org/#/c/13540/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (IMPALA-8641) Document compression codec zstd in Parquet

2019-06-27 Thread Abhishek Rawat (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8641 started by Abhishek Rawat.
--
> Document compression codec zstd in Parquet
> --
>
> Key: IMPALA-8641
> URL: https://issues.apache.org/jira/browse/IMPALA-8641
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Minor
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8495) Impala Doc: Document Data Read Cache

2019-06-27 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8495.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Impala Doc: Document Data Read Cache
> 
>
> Key: IMPALA-8495
> URL: https://issues.apache.org/jira/browse/IMPALA-8495
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>
> IMPALA-8341 introduces a data cache for remote reads. In particular, it 
> caches data for non-local reads (e.g. S3, ABFS, ADLS). The data cache can be 
> enabled setting the startup flag 
> {{--data_cache=,,...,:}} in which 
> {{,...,}} are directories on local filesystem and {{quota}} is 
> the storage consumption quota for each directory. Note that multiple Impala 
> daemons running on the same host *must not* share cache directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8495) Impala Doc: Document Data Read Cache

2019-06-27 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8495.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Impala Doc: Document Data Read Cache
> 
>
> Key: IMPALA-8495
> URL: https://issues.apache.org/jira/browse/IMPALA-8495
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>
> IMPALA-8341 introduces a data cache for remote reads. In particular, it 
> caches data for non-local reads (e.g. S3, ABFS, ADLS). The data cache can be 
> enabled setting the startup flag 
> {{--data_cache=,,...,:}} in which 
> {{,...,}} are directories on local filesystem and {{quota}} is 
> the storage consumption quota for each directory. Note that multiple Impala 
> daemons running on the same host *must not* share cache directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8722) test_hbase_col_filter failure

2019-06-27 Thread Bikramjeet Vig (JIRA)
Bikramjeet Vig created IMPALA-8722:
--

 Summary: test_hbase_col_filter failure
 Key: IMPALA-8722
 URL: https://issues.apache.org/jira/browse/IMPALA-8722
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.3.0
Reporter: Bikramjeet Vig
Assignee: Vihang Karajgaonkar


test_hbase_col_filter failure with the following exec params failed in one of 
the exhaustive runs
{noformat}
query_test.test_hbase_queries.TestHBaseQueries.test_hbase_col_filter[protocol: 
beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
hbase/none]
{noformat}
>From the logs it seems like the insert query was executed completely around 
>23:27:42 and the invalidate metadata in impala was run around 23:27:32

I am not sure if this is due to the log being buffered and written later, but 
if the insert finished after the invalidate metadata then it probably didn't 
get the necessary file data and hence the query that expected 3 rows, didn't 
get any. Note: The insert call is run in hive using 
"self.run_stmt_in_hive(add_data, username='hdfs')"

hive server 2 logs:
{noformat}
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.TableScanOperator: Initializing operator TS[0]
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.SelectOperator: Initializing operator SEL[1]
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.SelectOperator: SELECT 
struct
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: Initializing operator FS[2]
2019-06-26T23:27:42,465  INFO [LocalJobRunner Map Task Executor #0] 
hadoop.InternalParquetRecordReader: block read in memory in 17 ms. row count = 
2133979
2019-06-26T23:27:42,469  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: Using serializer : class 
org.apache.hadoop.hive.hbase.HBaseSerDe[[:key,cf:c:[k, c]:[string, string]]] 
and formatter : org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat@c73c2ac
2019-06-26T23:27:42,471  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: New Final Path: FS 
hdfs://localhost:20500/test-warehouse/test_hbase_col_filter_2598223d.db/_tmp.hbase_col_filter_testkeyx/00_0
2019-06-26T23:27:42,479 ERROR [LocalJobRunner Map Task Executor #0] 
hadoop.ParquetRecordReader: Can not initialize counter due to context is not a 
instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2019-06-26T23:27:42,482  INFO [LocalJobRunner Map Task Executor #0] 
hadoop.InternalParquetRecordReader: RecordReader initialized will read a total 
of 2142543 records.
2019-06-26T23:27:42,482  INFO [LocalJobRunner Map Task Executor #0] 
hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-06-26T23:27:42,496  INFO [ReadOnlyZKClient-localhost:2181@0x4d49ce4e] 
zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 
sessionTimeout=9 
watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$47/372191955@532dae72
2019-06-26T23:27:42,497  INFO 
[ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
2019-06-26T23:27:42,498  INFO 
[ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Socket connection established, initiating session, 
client: /0:0:0:0:0:0:0:1:55090, server: localhost/0:0:0:0:0:0:0:1:2181
2019-06-26T23:27:42,499  INFO 
[ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x16b96a782de00c5, negotiated 
timeout = 9
2019-06-26T23:27:42,503  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: FS[2]: records written - 1
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.MapOperator: MAP[0]: records read - 1
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.MapOperator: MAP[0]: Total records read - 1. abort - false
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:1, 
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: FS[2]: records written - 1
2019-06-26T23:27:42,511  INFO [SpillThread] mapred.MapTask: Finished spill 3
2019-06-26T23:27:42,514  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: TOTAL_TABLE_ROWS_WRITTEN:1, 
RECORDS_OUT_1_test_hbase_col_filter_2598223d.hbase_col_filter_testkeyx:1, 
2019-06-26T23:27:42,517  INFO [LocalJobRunner Map Task Executor #0] 

[jira] [Created] (IMPALA-8722) test_hbase_col_filter failure

2019-06-27 Thread Bikramjeet Vig (JIRA)
Bikramjeet Vig created IMPALA-8722:
--

 Summary: test_hbase_col_filter failure
 Key: IMPALA-8722
 URL: https://issues.apache.org/jira/browse/IMPALA-8722
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.3.0
Reporter: Bikramjeet Vig
Assignee: Vihang Karajgaonkar


test_hbase_col_filter failure with the following exec params failed in one of 
the exhaustive runs
{noformat}
query_test.test_hbase_queries.TestHBaseQueries.test_hbase_col_filter[protocol: 
beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
hbase/none]
{noformat}
>From the logs it seems like the insert query was executed completely around 
>23:27:42 and the invalidate metadata in impala was run around 23:27:32

I am not sure if this is due to the log being buffered and written later, but 
if the insert finished after the invalidate metadata then it probably didn't 
get the necessary file data and hence the query that expected 3 rows, didn't 
get any. Note: The insert call is run in hive using 
"self.run_stmt_in_hive(add_data, username='hdfs')"

hive server 2 logs:
{noformat}
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.TableScanOperator: Initializing operator TS[0]
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.SelectOperator: Initializing operator SEL[1]
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.SelectOperator: SELECT 
struct
2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: Initializing operator FS[2]
2019-06-26T23:27:42,465  INFO [LocalJobRunner Map Task Executor #0] 
hadoop.InternalParquetRecordReader: block read in memory in 17 ms. row count = 
2133979
2019-06-26T23:27:42,469  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: Using serializer : class 
org.apache.hadoop.hive.hbase.HBaseSerDe[[:key,cf:c:[k, c]:[string, string]]] 
and formatter : org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat@c73c2ac
2019-06-26T23:27:42,471  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: New Final Path: FS 
hdfs://localhost:20500/test-warehouse/test_hbase_col_filter_2598223d.db/_tmp.hbase_col_filter_testkeyx/00_0
2019-06-26T23:27:42,479 ERROR [LocalJobRunner Map Task Executor #0] 
hadoop.ParquetRecordReader: Can not initialize counter due to context is not a 
instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2019-06-26T23:27:42,482  INFO [LocalJobRunner Map Task Executor #0] 
hadoop.InternalParquetRecordReader: RecordReader initialized will read a total 
of 2142543 records.
2019-06-26T23:27:42,482  INFO [LocalJobRunner Map Task Executor #0] 
hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-06-26T23:27:42,496  INFO [ReadOnlyZKClient-localhost:2181@0x4d49ce4e] 
zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 
sessionTimeout=9 
watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$47/372191955@532dae72
2019-06-26T23:27:42,497  INFO 
[ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
2019-06-26T23:27:42,498  INFO 
[ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Socket connection established, initiating session, 
client: /0:0:0:0:0:0:0:1:55090, server: localhost/0:0:0:0:0:0:0:1:2181
2019-06-26T23:27:42,499  INFO 
[ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x16b96a782de00c5, negotiated 
timeout = 9
2019-06-26T23:27:42,503  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: FS[2]: records written - 1
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.MapOperator: MAP[0]: records read - 1
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.MapOperator: MAP[0]: Total records read - 1. abort - false
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:1, 
2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: FS[2]: records written - 1
2019-06-26T23:27:42,511  INFO [SpillThread] mapred.MapTask: Finished spill 3
2019-06-26T23:27:42,514  INFO [LocalJobRunner Map Task Executor #0] 
exec.FileSinkOperator: TOTAL_TABLE_ROWS_WRITTEN:1, 
RECORDS_OUT_1_test_hbase_col_filter_2598223d.hbase_col_filter_testkeyx:1, 
2019-06-26T23:27:42,517  INFO [LocalJobRunner Map Task Executor #0] 

[jira] [Comment Edited] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-27 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873765#comment-16873765
 ] 

Michael Ho edited comment on IMPALA-8712 at 6/27/19 6:59 PM:
-

On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

Some of the serialization overhead with ExecQueryFInstance() RPC even after 
IMPALA-7467 is still Thrift related as we just serialize a bunch of Thrift 
structures into a binary blob and send them via KRPC sidecar. The serialization 
is done in parallel by threads in {{exec_rpc_thread_pool_}}. If we convert 
those Thrift structures into Protobuf, then the serialization can be done in 
parallel by reactor threads in the KRPC stack.


was (Author: kwho):
On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories

2019-06-27 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8663:

Labels: catalog-v2 impala-acid  (was: impala-acid)

> FileMetadataLoader should skip listing files in hidden and tmp directories
> --
>
> Key: IMPALA-8663
> URL: https://issues.apache.org/jira/browse/IMPALA-8663
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: catalog-v2, impala-acid
>
> Currently, the file metadata loader recursively lists the table and partition 
> directories to get the fileStatuses. For each filestatus we ignore the hidden 
> files in {{FileSystemUtil.isValidDataFile}}(). However that is not 
> sufficient. For instance, if Hive is inserting data into a table when the 
> refresh is called, it is possible the staging directory is present within the 
> table directory. This staging directory is a hidden directory of the naming 
> {{.hive-staging_*}}. It is possible that this directory has files which are 
> not hidden (starting from a . or _). Such files should be considered 
> temporary files and should not be considered as valid data files.
>  
> Another instance where we see this happen is in transactional tables which 
> has a {{.manifest}} which is located in a {{_tmp}} directory within the table 
> directory. This file should also be skipped and not considered as a valid 
> data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8689) test_hive_impala_interop failing with "Timeout >7200s"

2019-06-27 Thread Abhishek Rawat (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874427#comment-16874427
 ] 

Abhishek Rawat commented on IMPALA-8689:


Updating the testcase to not include timestamp column. IMPALA-8721 opened to 
fix this issue.

> test_hive_impala_interop failing with "Timeout >7200s"
> --
>
> Key: IMPALA-8689
> URL: https://issues.apache.org/jira/browse/IMPALA-8689
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Andrew Sherman
>Assignee: Abhishek Rawat
>Priority: Critical
>  Labels: broken-build
>
> I think this is the new test added in IMPALA-8617
> {code}
> custom_cluster/test_hive_parquet_codec_interop.py:78: in 
> test_hive_impala_interop
> .format(codec, hive_table, impala_table))
> common/impala_test_suite.py:871: in run_stmt_in_hive
> (stdout, stderr) = call.communicate()
> /usr/lib64/python2.7/subprocess.py:800: in communicate
> return self._communicate(input)
> /usr/lib64/python2.7/subprocess.py:1401: in _communicate
> stdout, stderr = self._communicate_with_poll(input)
> /usr/lib64/python2.7/subprocess.py:1455: in _communicate_with_poll
> ready = poller.poll()
> E   Failed: Timeout >7200s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2019-06-27 Thread Abhishek Rawat (JIRA)
Abhishek Rawat created IMPALA-8721:
--

 Summary: Wrong result when Impala reads a Hive written parquet 
TimeStamp column
 Key: IMPALA-8721
 URL: https://issues.apache.org/jira/browse/IMPALA-8721
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat
 Fix For: Impala 3.3.0


 

Easy to repro on latest upstream:
{code:java}
hive> create table t1_hive(c1 timestamp) stored as parquet;
hive> insert into t1_hive values('2009-03-09 01:20:03.6');
hive> select * from t1_hive;
OK
2009-03-09 01:20:03.6
[localhost:21000] default> invalidate metadata t1_hive;
[localhost:21000] default> select * from t1_hive;
Query: select * from t1_hive
Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
http://optimus-prime:25000)
Query progress can be monitored at: 
http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
+---+
| c1 |
+---+
| 2009-03-09 09:20:03.6 |  select * from t1_hive;
Query: select * from t1_hive
Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
http://optimus-prime:25000)
Query progress can be monitored at: 
http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
+---+
| c1 |
+---+
| 2009-03-09 02:20:03.6 |. <

[jira] [Created] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2019-06-27 Thread Abhishek Rawat (JIRA)
Abhishek Rawat created IMPALA-8721:
--

 Summary: Wrong result when Impala reads a Hive written parquet 
TimeStamp column
 Key: IMPALA-8721
 URL: https://issues.apache.org/jira/browse/IMPALA-8721
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat
 Fix For: Impala 3.3.0


 

Easy to repro on latest upstream:
{code:java}
hive> create table t1_hive(c1 timestamp) stored as parquet;
hive> insert into t1_hive values('2009-03-09 01:20:03.6');
hive> select * from t1_hive;
OK
2009-03-09 01:20:03.6
[localhost:21000] default> invalidate metadata t1_hive;
[localhost:21000] default> select * from t1_hive;
Query: select * from t1_hive
Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
http://optimus-prime:25000)
Query progress can be monitored at: 
http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
+---+
| c1 |
+---+
| 2009-03-09 09:20:03.6 |  select * from t1_hive;
Query: select * from t1_hive
Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
http://optimus-prime:25000)
Query progress can be monitored at: 
http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
+---+
| c1 |
+---+
| 2009-03-09 02:20:03.6 |. <

[jira] [Work started] (IMPALA-8689) test_hive_impala_interop failing with "Timeout >7200s"

2019-06-27 Thread Abhishek Rawat (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8689 started by Abhishek Rawat.
--
> test_hive_impala_interop failing with "Timeout >7200s"
> --
>
> Key: IMPALA-8689
> URL: https://issues.apache.org/jira/browse/IMPALA-8689
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Andrew Sherman
>Assignee: Abhishek Rawat
>Priority: Critical
>  Labels: broken-build
>
> I think this is the new test added in IMPALA-8617
> {code}
> custom_cluster/test_hive_parquet_codec_interop.py:78: in 
> test_hive_impala_interop
> .format(codec, hive_table, impala_table))
> common/impala_test_suite.py:871: in run_stmt_in_hive
> (stdout, stderr) = call.communicate()
> /usr/lib64/python2.7/subprocess.py:800: in communicate
> return self._communicate(input)
> /usr/lib64/python2.7/subprocess.py:1401: in _communicate
> stdout, stderr = self._communicate_with_poll(input)
> /usr/lib64/python2.7/subprocess.py:1455: in _communicate_with_poll
> ready = poller.poll()
> E   Failed: Timeout >7200s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-5149) Provide query profile in JSON format

2019-06-27 Thread Jiawei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-5149 started by Jiawei Wang.
---
> Provide query profile in JSON format
> 
>
> Key: IMPALA-5149
> URL: https://issues.apache.org/jira/browse/IMPALA-5149
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Greg Rahn
>Assignee: Jiawei Wang
>Priority: Major
>
> Today there is a text and Thrift version of the query profile, but it would 
> be useful to have a JSON version for portability and machine consumption.
> It may also make sense to make some organizational changes to the profile to 
> make analysis easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5149) Provide query profile in JSON format

2019-06-27 Thread Jiawei Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiawei Wang reassigned IMPALA-5149:
---

Assignee: Jiawei Wang

> Provide query profile in JSON format
> 
>
> Key: IMPALA-5149
> URL: https://issues.apache.org/jira/browse/IMPALA-5149
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Greg Rahn
>Assignee: Jiawei Wang
>Priority: Major
>
> Today there is a text and Thrift version of the query profile, but it would 
> be useful to have a JSON version for portability and machine consumption.
> It may also make sense to make some organizational changes to the profile to 
> make analysis easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7369) Implement DATE builtin functions

2019-06-27 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges resolved IMPALA-7369.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Implement DATE builtin functions
> 
>
> Key: IMPALA-7369
> URL: https://issues.apache.org/jira/browse/IMPALA-7369
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> - Built-in functions supported in Hive should be implemented in Impala es 
> well.
> - Already implemented TIMESTAMP built-in functions that work on the date part 
> of timestamps should be implemented for DATE types too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7369) Implement DATE builtin functions

2019-06-27 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges resolved IMPALA-7369.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Implement DATE builtin functions
> 
>
> Key: IMPALA-7369
> URL: https://issues.apache.org/jira/browse/IMPALA-7369
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> - Built-in functions supported in Hive should be implemented in Impala es 
> well.
> - Already implemented TIMESTAMP built-in functions that work on the date part 
> of timestamps should be implemented for DATE types too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8665) Include extra info in error message when date cast fails

2019-06-27 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges resolved IMPALA-8665.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Include extra info in error message when date cast fails
> 
>
> Key: IMPALA-8665
> URL: https://issues.apache.org/jira/browse/IMPALA-8665
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Gabor Kaszab
>Assignee: Jiawei Wang
>Priority: Major
>  Labels: newbie, supportability
> Fix For: Impala 3.3.0
>
>
> {code:java}
> select cast("2000" as date); 
> ERROR: UDF ERROR: String to Date parse failed.
> {code}
>  If we assume the user has millions of rows in a table and makes a cast on 
> them it's hard to debug which one of the made the cast fail as there is no 
> indication in the error message. Let's include at least the input value of 
> the cast to the message.
> Here we have everything available to do so:
> https://github.com/apache/impala/blob/94652d74521e95e8606ea2d22aabcaddde6fc471/be/src/exprs/cast-functions-ir.cc#L313
> {code:java}
> DateVal CastFunctions::CastToDateVal(FunctionContext* ctx, const StringVal& 
> val) {
>   if (val.is_null) return DateVal::null();
>   DateValue dv = DateValue::Parse(reinterpret_cast(val.ptr), val.len, 
> true);
>   if (UNLIKELY(!dv.IsValid())) {
> ctx->SetError("String to Date parse failed.");
> return DateVal::null();
>   }
>   return dv.ToDateVal();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-8714) Metrics for tracking query failure reasons

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8714:
--
Description: 
A query can fail for various reasons when run in Impala:
- analysis failure (e.g. SQL syntax mistake)
- access right issue (e.g. no privilege for certain ops on a table)
- memory limit exceeded
- scratch limit exceeded
- EXEC_TIME_LIMIT_S exceeded
- other guard rail exceeded
- executor crashes
- rpc failures
- corrupted data files 
- client disconnected or session timed out

This JIRA tracks the effort to explicitly classify query failures into some 
high level categories and expose them in the metrics. This should provide us a 
clearer view of what types of failures contribute the most to bad user 
experience for a given cluster.

  was:
A query can fail for various reasons when run in Impala:
- analysis failure (e.g. SQL syntax mistake)
- access right issue (e.g. no privilege for certain ops on a table)
- memory limit exceeded
- scratch limit exceeded
- executor crashes
- rpc failures
- corrupted data files 

This JIRA tracks the effort to explicitly classify query failures into some 
high level categories and expose them in the metrics. This should provide us a 
clearer view of what types of failures contribute the most to bad user 
experience for a given cluster.


> Metrics for tracking query failure reasons
> --
>
> Key: IMPALA-8714
> URL: https://issues.apache.org/jira/browse/IMPALA-8714
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Priority: Major
>  Labels: observability
>
> A query can fail for various reasons when run in Impala:
> - analysis failure (e.g. SQL syntax mistake)
> - access right issue (e.g. no privilege for certain ops on a table)
> - memory limit exceeded
> - scratch limit exceeded
> - EXEC_TIME_LIMIT_S exceeded
> - other guard rail exceeded
> - executor crashes
> - rpc failures
> - corrupted data files 
> - client disconnected or session timed out
> This JIRA tracks the effort to explicitly classify query failures into some 
> high level categories and expose them in the metrics. This should provide us 
> a clearer view of what types of failures contribute the most to bad user 
> experience for a given cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8714) Metrics for tracking query failure reasons

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8714:
--
Description: 
A query can fail for various reasons when run in Impala:
- analysis failure (e.g. SQL syntax mistake)
- access right issue (e.g. no privilege for certain ops on a table)
- memory limit exceeded
- scratch limit exceeded
- executor crashes
- rpc failures
- corrupted data files 

This JIRA tracks the effort to explicitly classify query failures into some 
high level categories and expose them in the metrics. This should provide us a 
clearer view of what types of failures contribute the most to bad user 
experience for a given cluster.

  was:
A query can fail for various reasons when run in Impala:
- analysis failure (e.g. SQL syntax mistake)
- access right issue (e.g. no privilege for certain ops on a table)
- memory limit exceeded
- executor crashes
- rpc failures
- corrupted data files 

This JIRA tracks the effort to explicitly classify query failures into some 
high level categories and expose them in the metrics. This should provide us a 
clearer view of what types of failures contribute the most to bad user 
experience for a given cluster.


> Metrics for tracking query failure reasons
> --
>
> Key: IMPALA-8714
> URL: https://issues.apache.org/jira/browse/IMPALA-8714
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Priority: Major
>  Labels: observability
>
> A query can fail for various reasons when run in Impala:
> - analysis failure (e.g. SQL syntax mistake)
> - access right issue (e.g. no privilege for certain ops on a table)
> - memory limit exceeded
> - scratch limit exceeded
> - executor crashes
> - rpc failures
> - corrupted data files 
> This JIRA tracks the effort to explicitly classify query failures into some 
> high level categories and expose them in the metrics. This should provide us 
> a clearer view of what types of failures contribute the most to bad user 
> experience for a given cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8462) Get exhaustive tests passing with dockerised minicluster

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8462:
--
Target Version: Impala 3.3.0

> Get exhaustive tests passing with dockerised minicluster
> 
>
> Key: IMPALA-8462
> URL: https://issues.apache.org/jira/browse/IMPALA-8462
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8451) Default configs for admission control

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8451:
--
Target Version: Impala 3.3.0

> Default configs for admission control
> -
>
> Key: IMPALA-8451
> URL: https://issues.apache.org/jira/browse/IMPALA-8451
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> We probably want to have some basic admission control enabled for the 
> dockerised containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8630) Consistent remote placement should include partition information when calculating placement

2019-06-27 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-8630.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Consistent remote placement should include partition information when 
> calculating placement
> ---
>
> Key: IMPALA-8630
> URL: https://issues.apache.org/jira/browse/IMPALA-8630
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> For partitioned tables, the actual filenames within partitions may not have 
> large entropy. Impala includes information in its filenames that would not be 
> the same across partitions, but this is common for tables written by the 
> current CDH version of Hive. For example, in our minicluster, the TPC-DS 
> store_sales table has many partitions, but the actual filenames within 
> partitions are very simple:
> {noformat}
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 379535 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642/00_0
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 412959 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640/00_0{noformat}
> Right now, consistent remote placement uses the filename+offset without the 
> partition id.
> {code:java}
> uint32_t hash = HashUtil::Hash(hdfs_file_split->relative_path.data(),
>   hdfs_file_split->relative_path.length(), 0);
> {code}
> This would produce a poor balance of files across nodes when there is low 
> entropy in filenames. This should be amended to include the partition id, 
> which is already accessible on the THdfsFileSplit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8630) Consistent remote placement should include partition information when calculating placement

2019-06-27 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-8630.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Consistent remote placement should include partition information when 
> calculating placement
> ---
>
> Key: IMPALA-8630
> URL: https://issues.apache.org/jira/browse/IMPALA-8630
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> For partitioned tables, the actual filenames within partitions may not have 
> large entropy. Impala includes information in its filenames that would not be 
> the same across partitions, but this is common for tables written by the 
> current CDH version of Hive. For example, in our minicluster, the TPC-DS 
> store_sales table has many partitions, but the actual filenames within 
> partitions are very simple:
> {noformat}
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 379535 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642/00_0
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 412959 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640/00_0{noformat}
> Right now, consistent remote placement uses the filename+offset without the 
> partition id.
> {code:java}
> uint32_t hash = HashUtil::Hash(hdfs_file_split->relative_path.data(),
>   hdfs_file_split->relative_path.length(), 0);
> {code}
> This would produce a poor balance of files across nodes when there is low 
> entropy in filenames. This should be amended to include the partition id, 
> which is already accessible on the THdfsFileSplit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8630) Consistent remote placement should include partition information when calculating placement

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874362#comment-16874362
 ] 

ASF subversion and git services commented on IMPALA-8630:
-

Commit a20977a5c0b37d20aab13bb441756f887a2d1c59 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a20977a ]

IMPALA-8630: Hash the full path when calculating consistent remote placement

Consistent remote placement currently uses the relative filename within
a partition for the consistent hash. If the relative filenames for
different partitions have a simple naming scheme, then multiple
partitions may have files of the same name. This is true for some
tables written by Hive (e.g. in our minicluster the tpcds.store_sales
has this problem). This can lead to unbalanced placement of remote
ranges.

This adds a partition_path_hash to the THdfsFileSplit and
THdfsFileSplitGeneratorSpec, calculated in the frontend (which has all of
the partition information). The scheduler hashes this in addition to
the relative path.

Testing:
 - Added several new scheduler tests that verify the consistent remote
   scheduling sees blocks with different relative paths, partition paths,
   or offsets as distinct.
 - Ran core tests

Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Reviewed-on: http://gerrit.cloudera.org:8080/13545
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Consistent remote placement should include partition information when 
> calculating placement
> ---
>
> Key: IMPALA-8630
> URL: https://issues.apache.org/jira/browse/IMPALA-8630
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> For partitioned tables, the actual filenames within partitions may not have 
> large entropy. Impala includes information in its filenames that would not be 
> the same across partitions, but this is common for tables written by the 
> current CDH version of Hive. For example, in our minicluster, the TPC-DS 
> store_sales table has many partitions, but the actual filenames within 
> partitions are very simple:
> {noformat}
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 379535 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452642/00_0
> hdfs dfs -ls /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640
> Found 1 items
> -rwxr-xr-x 3 joe supergroup 412959 2019-06-05 15:16 
> /test-warehouse/tpcds.store_sales/ss_sold_date_sk=2452640/00_0{noformat}
> Right now, consistent remote placement uses the filename+offset without the 
> partition id.
> {code:java}
> uint32_t hash = HashUtil::Hash(hdfs_file_split->relative_path.data(),
>   hdfs_file_split->relative_path.length(), 0);
> {code}
> This would produce a poor balance of files across nodes when there is low 
> entropy in filenames. This should be amended to include the partition id, 
> which is already accessible on the THdfsFileSplit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8665) Include extra info in error message when date cast fails

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874361#comment-16874361
 ] 

ASF subversion and git services commented on IMPALA-8665:
-

Commit dbba52c77c4bd83479fc44c323d76ec728b17aeb in impala's branch 
refs/heads/master from Jiawei Wang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dbba52c ]

IMPALA-8665:Include extra info in error message when date cast fails

This change extends the error message Impala yields when casting STRING
to DATE (explicitly or implicitly) fails. The new error message includes
the violating string value.

Testing:
changes -> date-partitioning.test & date.test
query_test/test_date_queries.py test passed

Example:
select cast('20' as date);
ERROR: UDF ERROR: String to Date parse failed. Invalid string val: "20"

Change-Id: If800b7696515cd61afee27220c55ff2440a86f04
Reviewed-on: http://gerrit.cloudera.org:8080/13680
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Include extra info in error message when date cast fails
> 
>
> Key: IMPALA-8665
> URL: https://issues.apache.org/jira/browse/IMPALA-8665
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Gabor Kaszab
>Assignee: Jiawei Wang
>Priority: Major
>  Labels: newbie, supportability
>
> {code:java}
> select cast("2000" as date); 
> ERROR: UDF ERROR: String to Date parse failed.
> {code}
>  If we assume the user has millions of rows in a table and makes a cast on 
> them it's hard to debug which one of the made the cast fail as there is no 
> indication in the error message. Let's include at least the input value of 
> the cast to the message.
> Here we have everything available to do so:
> https://github.com/apache/impala/blob/94652d74521e95e8606ea2d22aabcaddde6fc471/be/src/exprs/cast-functions-ir.cc#L313
> {code:java}
> DateVal CastFunctions::CastToDateVal(FunctionContext* ctx, const StringVal& 
> val) {
>   if (val.is_null) return DateVal::null();
>   DateValue dv = DateValue::Parse(reinterpret_cast(val.ptr), val.len, 
> true);
>   if (UNLIKELY(!dv.IsValid())) {
> ctx->SetError("String to Date parse failed.");
> return DateVal::null();
>   }
>   return dv.ToDateVal();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8648) Impala ACID read stress tests

2019-06-27 Thread Csaba Ringhofer (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874335#comment-16874335
 ] 

Csaba Ringhofer commented on IMPALA-8648:
-

https://gerrit.cloudera.org/#/c/13751/

> Impala ACID read stress tests
> -
>
> Key: IMPALA-8648
> URL: https://issues.apache.org/jira/browse/IMPALA-8648
> Project: IMPALA
>  Issue Type: Test
>Reporter: Dinesh Garg
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5765) Flaky tpc-ds data loading

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5765.
---
Resolution: Won't Fix

Hasn't happened for a long time and I don't think we would realistically put 
the effort in to work around the hive issue with the current frequency.

> Flaky tpc-ds data loading
> -
>
> Key: IMPALA-5765
> URL: https://issues.apache.org/jira/browse/IMPALA-5765
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Matthew Jacobs
>Assignee: Philip Zeyliger
>Priority: Major
>  Labels: flaky
>
> Saw this on a number of gerrit-verify-dryrun jobs:
> {code}
> 23:49:37 Loading TPC-DS data (logging to 
> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... 
> 23:55:39 FAILED (Took: 6 min 2 sec)
> 23:55:39 'load-data tpcds core' failed. Tail of log:
> 23:55:39 ss_net_profit,
> 23:55:39 ss_sold_date_sk
> 23:55:39 from store_sales_unpartitioned
> 23:55:39 WHERE ss_sold_date_sk < 2451272
> 23:55:39 distribute by ss_sold_date_sk
> 23:55:39 INFO  : Query ID = 
> ubuntu_2017073123_26963c6a-a58b-4cad-b0c7-c3790f9b22dc
> 23:55:39 INFO  : Total jobs = 1
> 23:55:39 INFO  : Launching Job 1 out of 1
> 23:55:39 INFO  : Starting task [Stage-1:MAPRED] in serial mode
> 23:55:39 INFO  : Number of reduce tasks not specified. Estimated from input 
> data size: 2
> 23:55:39 INFO  : In order to change the average load for a reducer (in bytes):
> 23:55:39 INFO  :   set hive.exec.reducers.bytes.per.reducer=
> 23:55:39 INFO  : In order to limit the maximum number of reducers:
> 23:55:39 INFO  :   set hive.exec.reducers.max=
> 23:55:39 INFO  : In order to set a constant number of reducers:
> 23:55:39 INFO  :   set mapreduce.job.reduces=
> 23:55:39 INFO  : number of splits:2
> 23:55:39 INFO  : Submitting tokens for job: job_local1252085428_0826
> 23:55:39 INFO  : The url to track the job: http://localhost:8080/
> 23:55:39 INFO  : Job running in-process (local Hadoop)
> 23:55:39 INFO  : 2017-07-31 23:55:06,606 Stage-1 map = 0%,  reduce = 0%
> 23:55:39 INFO  : 2017-07-31 23:55:13,609 Stage-1 map = 100%,  reduce = 0%
> 23:55:39 INFO  : 2017-07-31 23:55:28,621 Stage-1 map = 100%,  reduce = 33%
> 23:55:39 ERROR : Ended Job = job_local1252085428_0826 with errors
> 23:55:39 ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 23:55:39 INFO  : MapReduce Jobs Launched: 
> 23:55:39 INFO  : Stage-Stage-1:  HDFS Read: 26483258512 HDFS Write: 
> 19378762131 FAIL
> 23:55:39 INFO  : Total MapReduce CPU Time Spent: 0 msec
> 23:55:39 INFO  : Completed executing 
> command(queryId=ubuntu_2017073123_26963c6a-a58b-4cad-b0c7-c3790f9b22dc); 
> Time taken: 33.276 seconds
> 23:55:39 Error: Error while processing statement: FAILED: Execution Error, 
> return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 
> (state=08S01,code=2)
> 23:55:39 java.sql.SQLException: Error while processing statement: FAILED: 
> Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 23:55:39  at 
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
> 23:55:39  at 
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
> 23:55:39  at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
> 23:55:39  at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
> 23:55:39  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
> 23:55:39  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
> 23:55:39  at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
> 23:55:39  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
> 23:55:39  at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
> 23:55:39  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> 23:55:39  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 23:55:39  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 23:55:39  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 23:55:39  at java.lang.reflect.Method.invoke(Method.java:606)
> 23:55:39  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 23:55:39  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 23:55:39 
> 23:55:39 Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> 23:55:39 Error executing file from Hive: load-tpcds-core-hive-generated.sql
> 23:55:39 Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at 
> line 48: LOAD_DATA_ARGS=""
> {code}
> https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1827/
> It's been reported a few times in the last week. Here's another 

[jira] [Resolved] (IMPALA-5765) Flaky tpc-ds data loading

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5765.
---
Resolution: Won't Fix

Hasn't happened for a long time and I don't think we would realistically put 
the effort in to work around the hive issue with the current frequency.

> Flaky tpc-ds data loading
> -
>
> Key: IMPALA-5765
> URL: https://issues.apache.org/jira/browse/IMPALA-5765
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Matthew Jacobs
>Assignee: Philip Zeyliger
>Priority: Major
>  Labels: flaky
>
> Saw this on a number of gerrit-verify-dryrun jobs:
> {code}
> 23:49:37 Loading TPC-DS data (logging to 
> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... 
> 23:55:39 FAILED (Took: 6 min 2 sec)
> 23:55:39 'load-data tpcds core' failed. Tail of log:
> 23:55:39 ss_net_profit,
> 23:55:39 ss_sold_date_sk
> 23:55:39 from store_sales_unpartitioned
> 23:55:39 WHERE ss_sold_date_sk < 2451272
> 23:55:39 distribute by ss_sold_date_sk
> 23:55:39 INFO  : Query ID = 
> ubuntu_2017073123_26963c6a-a58b-4cad-b0c7-c3790f9b22dc
> 23:55:39 INFO  : Total jobs = 1
> 23:55:39 INFO  : Launching Job 1 out of 1
> 23:55:39 INFO  : Starting task [Stage-1:MAPRED] in serial mode
> 23:55:39 INFO  : Number of reduce tasks not specified. Estimated from input 
> data size: 2
> 23:55:39 INFO  : In order to change the average load for a reducer (in bytes):
> 23:55:39 INFO  :   set hive.exec.reducers.bytes.per.reducer=
> 23:55:39 INFO  : In order to limit the maximum number of reducers:
> 23:55:39 INFO  :   set hive.exec.reducers.max=
> 23:55:39 INFO  : In order to set a constant number of reducers:
> 23:55:39 INFO  :   set mapreduce.job.reduces=
> 23:55:39 INFO  : number of splits:2
> 23:55:39 INFO  : Submitting tokens for job: job_local1252085428_0826
> 23:55:39 INFO  : The url to track the job: http://localhost:8080/
> 23:55:39 INFO  : Job running in-process (local Hadoop)
> 23:55:39 INFO  : 2017-07-31 23:55:06,606 Stage-1 map = 0%,  reduce = 0%
> 23:55:39 INFO  : 2017-07-31 23:55:13,609 Stage-1 map = 100%,  reduce = 0%
> 23:55:39 INFO  : 2017-07-31 23:55:28,621 Stage-1 map = 100%,  reduce = 33%
> 23:55:39 ERROR : Ended Job = job_local1252085428_0826 with errors
> 23:55:39 ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 23:55:39 INFO  : MapReduce Jobs Launched: 
> 23:55:39 INFO  : Stage-Stage-1:  HDFS Read: 26483258512 HDFS Write: 
> 19378762131 FAIL
> 23:55:39 INFO  : Total MapReduce CPU Time Spent: 0 msec
> 23:55:39 INFO  : Completed executing 
> command(queryId=ubuntu_2017073123_26963c6a-a58b-4cad-b0c7-c3790f9b22dc); 
> Time taken: 33.276 seconds
> 23:55:39 Error: Error while processing statement: FAILED: Execution Error, 
> return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 
> (state=08S01,code=2)
> 23:55:39 java.sql.SQLException: Error while processing statement: FAILED: 
> Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 23:55:39  at 
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
> 23:55:39  at 
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
> 23:55:39  at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
> 23:55:39  at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
> 23:55:39  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
> 23:55:39  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
> 23:55:39  at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
> 23:55:39  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
> 23:55:39  at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
> 23:55:39  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> 23:55:39  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 23:55:39  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 23:55:39  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 23:55:39  at java.lang.reflect.Method.invoke(Method.java:606)
> 23:55:39  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 23:55:39  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 23:55:39 
> 23:55:39 Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> 23:55:39 Error executing file from Hive: load-tpcds-core-hive-generated.sql
> 23:55:39 Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at 
> line 48: LOAD_DATA_ARGS=""
> {code}
> https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1827/
> It's been reported a few times in the last week. Here's another 

[jira] [Created] (IMPALA-8720) Impala frontend jar should not depend on Sentry jars when building against hive-3 profile

2019-06-27 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created IMPALA-8720:
---

 Summary: Impala frontend jar should not depend on Sentry jars when 
building against hive-3 profile
 Key: IMPALA-8720
 URL: https://issues.apache.org/jira/browse/IMPALA-8720
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


It looks like for {{hive-3}} based setups, frontend jar still depends on sentry 
jars. However, sentry does not work with HMS-3 as of today. This unnecessary 
pulls in sentry jars from maven repositories when building against a CDP. We 
should pull in sentry jars only when it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8720) Impala frontend jar should not depend on Sentry jars when building against hive-3 profile

2019-06-27 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created IMPALA-8720:
---

 Summary: Impala frontend jar should not depend on Sentry jars when 
building against hive-3 profile
 Key: IMPALA-8720
 URL: https://issues.apache.org/jira/browse/IMPALA-8720
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


It looks like for {{hive-3}} based setups, frontend jar still depends on sentry 
jars. However, sentry does not work with HMS-3 as of today. This unnecessary 
pulls in sentry jars from maven repositories when building against a CDP. We 
should pull in sentry jars only when it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7787) python26-incompatibility-check failed because of docker 503 Service Unavailable

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7787.
---
   Resolution: Fixed
Fix Version/s: Not Applicable

Hasn't happened for a while, maybe Phil's change did the trick.

> python26-incompatibility-check failed because of docker 503 Service 
> Unavailable
> ---
>
> Key: IMPALA-7787
> URL: https://issues.apache.org/jira/browse/IMPALA-7787
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Philip Zeyliger
>Priority: Major
>  Labels: flaky
> Fix For: Not Applicable
>
>
> https://jenkins.impala.io/job/python26-incompatibility-check/529
> https://jenkins.impala.io/job/python26-incompatibility-check/528
> {noformat}
> 15:50:37 Initialized empty Git repository in /tmp/tmp.MKJUMZ3SBi/.git/
> 15:50:37 + git fetch http://gerrit.cloudera.org:8080/Impala-ASF 
> refs/changes/00/11800/5
> 15:50:54 From http://gerrit.cloudera.org:8080/Impala-ASF
> 15:50:54  * branchrefs/changes/00/11800/5 -> FETCH_HEAD
> 15:50:54 + git archive --prefix=impala/ -o /tmp/impala.tar FETCH_HEAD
> 15:50:54 + docker run -u nobody -v /tmp/impala.tar:/tmp/impala.tar centos:6 
> bash -o pipefail -c 'cd /tmp; python -c '\''import 
> tarfile;tarfile.TarFile("/tmp/impala.tar").extractall()'\''; python -m 
> compileall /tmp/impala'
> 15:50:54 Unable to find image 'centos:6' locally
> 15:50:55 docker: Error response from daemon: Get 
> https://registry-1.docker.io/v2/library/centos/manifests/6: received 
> unexpected HTTP status: 503 Service Unavailable.
> 15:50:55 See 'docker run --help'.
> 15:50:55 Build step 'Execute shell' marked build as failure
> 15:50:55 Set build name.
> 15:50:55 New build name is '#529 refs/changes/00/11800/5'
> 15:50:57 Finished: FAILURE
> {noformat}
> This happened a couple of times. Looks like flakiness but unsure if it was 
> just a transient infra issue or something we're doing wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7787) python26-incompatibility-check failed because of docker 503 Service Unavailable

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7787.
---
   Resolution: Fixed
Fix Version/s: Not Applicable

Hasn't happened for a while, maybe Phil's change did the trick.

> python26-incompatibility-check failed because of docker 503 Service 
> Unavailable
> ---
>
> Key: IMPALA-7787
> URL: https://issues.apache.org/jira/browse/IMPALA-7787
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Philip Zeyliger
>Priority: Major
>  Labels: flaky
> Fix For: Not Applicable
>
>
> https://jenkins.impala.io/job/python26-incompatibility-check/529
> https://jenkins.impala.io/job/python26-incompatibility-check/528
> {noformat}
> 15:50:37 Initialized empty Git repository in /tmp/tmp.MKJUMZ3SBi/.git/
> 15:50:37 + git fetch http://gerrit.cloudera.org:8080/Impala-ASF 
> refs/changes/00/11800/5
> 15:50:54 From http://gerrit.cloudera.org:8080/Impala-ASF
> 15:50:54  * branchrefs/changes/00/11800/5 -> FETCH_HEAD
> 15:50:54 + git archive --prefix=impala/ -o /tmp/impala.tar FETCH_HEAD
> 15:50:54 + docker run -u nobody -v /tmp/impala.tar:/tmp/impala.tar centos:6 
> bash -o pipefail -c 'cd /tmp; python -c '\''import 
> tarfile;tarfile.TarFile("/tmp/impala.tar").extractall()'\''; python -m 
> compileall /tmp/impala'
> 15:50:54 Unable to find image 'centos:6' locally
> 15:50:55 docker: Error response from daemon: Get 
> https://registry-1.docker.io/v2/library/centos/manifests/6: received 
> unexpected HTTP status: 503 Service Unavailable.
> 15:50:55 See 'docker run --help'.
> 15:50:55 Build step 'Execute shell' marked build as failure
> 15:50:55 Set build name.
> 15:50:55 New build name is '#529 refs/changes/00/11800/5'
> 15:50:57 Finished: FAILURE
> {noformat}
> This happened a couple of times. Looks like flakiness but unsure if it was 
> just a transient infra issue or something we're doing wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-4631) plan-fragment-executor.cc:518] Check failed: other_time <= total_time (25999394 vs. 25999393)

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4631:
-

Assignee: Tim Armstrong

> plan-fragment-executor.cc:518] Check failed: other_time <= total_time 
> (25999394 vs. 25999393)
> -
>
> Key: IMPALA-4631
> URL: https://issues.apache.org/jira/browse/IMPALA-4631
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Dan Hecht
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: flaky
>
> This dcheck occasionally fires:
> {code}
> impalad.FATAL:F1201 22:35:58.617157 30293 plan-fragment-executor.cc:518] 
> Check failed: other_time <= total_time (25999394 vs. 25999393)
> {code}
> I suspect the problem is with using floating point operations in places like 
> this:
> {code}
>timespec ts;
> clock_gettime(OsInfo::fast_clock(), );
> return ts.tv_sec * 1e9 + ts.tv_nsec;
> {code}
> and because floating point doesn't distribute, and we can end up with 
> {noformat} c * (a + b) < c * a + c * b {noformat} which is effectively what 
> the dcheck does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6046) test_partition_metadata_compatibility error: Hive query failing (HADOOP-13809)

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6046.
---
Resolution: Won't Fix

Appears to be a JDK bug according to the HADOOP jira. I don't think we can do 
anything on our end. Hasn't happened for a while so hopefully this is solved 
with time and JDK upgrades

> test_partition_metadata_compatibility error: Hive query failing (HADOOP-13809)
> --
>
> Key: IMPALA-6046
> URL: https://issues.apache.org/jira/browse/IMPALA-6046
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.11.0, Impala 2.12.0
>Reporter: Bikramjeet Vig
>Priority: Major
>  Labels: flaky
>
> for the test 
> metadata/test_partition_metadata.py::TestPartitionMetadata::test_partition_metadata_compatibility,
>  a query to hive using beeline/HS2 is failing.
> From Hive logs:
> {noformat}
> 2017-10-11 17:59:13,631 ERROR transport.TSaslTransport 
> (TSaslTransport.java:open(315)) - SASL negotiation failure
> javax.security.sasl.SaslException: Invalid message format [Caused by 
> java.lang.IllegalStateException: zip file closed]
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:107)
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: zip file closed
>   at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634)
>   at java.util.zip.ZipFile.getEntry(ZipFile.java:305)
>   at java.util.jar.JarFile.getEntry(JarFile.java:227)
>   at sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:128)
>   at 
> sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:132)
>   at 
> sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:150)
>   at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:233)
>   at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:94)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
>   at 
> javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:283)
>   at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
>   at 
> javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2606)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2583)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2489)
>   at org.apache.hadoop.conf.Configuration.set(Configuration.java:1174)
>   at org.apache.hadoop.conf.Configuration.set(Configuration.java:1146)
>   at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:525)
>   at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:543)
>   at org.apache.hadoop.mapred.JobConf.(JobConf.java:437)
>   at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2803)
>   at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:2761)
>   at 
> org.apache.hive.service.auth.AuthenticationProviderFactory.getAuthenticationProvider(AuthenticationProviderFactory.java:61)
>   at 
> org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:104)
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:102)
>   ... 8 more
> 2017-10-11 17:59:13,633 INFO  session.SessionState 
> (SessionState.java:dropPathAndUnregisterDeleteOnExit(785)) - Deleted 
> directory: /tmp/hive/jenkins/72505700-e690-4355-bdd2-55db2188a976 on fs with 
> scheme hdfs
> 2017-10-11 17:59:13,635 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(297)) - Error occurred during processing of 
> message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> Invalid message format
>   at 
> 

[jira] [Resolved] (IMPALA-6046) test_partition_metadata_compatibility error: Hive query failing (HADOOP-13809)

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6046.
---
Resolution: Won't Fix

Appears to be a JDK bug according to the HADOOP jira. I don't think we can do 
anything on our end. Hasn't happened for a while so hopefully this is solved 
with time and JDK upgrades

> test_partition_metadata_compatibility error: Hive query failing (HADOOP-13809)
> --
>
> Key: IMPALA-6046
> URL: https://issues.apache.org/jira/browse/IMPALA-6046
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.11.0, Impala 2.12.0
>Reporter: Bikramjeet Vig
>Priority: Major
>  Labels: flaky
>
> for the test 
> metadata/test_partition_metadata.py::TestPartitionMetadata::test_partition_metadata_compatibility,
>  a query to hive using beeline/HS2 is failing.
> From Hive logs:
> {noformat}
> 2017-10-11 17:59:13,631 ERROR transport.TSaslTransport 
> (TSaslTransport.java:open(315)) - SASL negotiation failure
> javax.security.sasl.SaslException: Invalid message format [Caused by 
> java.lang.IllegalStateException: zip file closed]
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:107)
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: zip file closed
>   at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634)
>   at java.util.zip.ZipFile.getEntry(ZipFile.java:305)
>   at java.util.jar.JarFile.getEntry(JarFile.java:227)
>   at sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:128)
>   at 
> sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:132)
>   at 
> sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:150)
>   at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:233)
>   at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:94)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
>   at 
> javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:283)
>   at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
>   at 
> javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2606)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2583)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2489)
>   at org.apache.hadoop.conf.Configuration.set(Configuration.java:1174)
>   at org.apache.hadoop.conf.Configuration.set(Configuration.java:1146)
>   at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:525)
>   at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:543)
>   at org.apache.hadoop.mapred.JobConf.(JobConf.java:437)
>   at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2803)
>   at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:2761)
>   at 
> org.apache.hive.service.auth.AuthenticationProviderFactory.getAuthenticationProvider(AuthenticationProviderFactory.java:61)
>   at 
> org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:104)
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:102)
>   ... 8 more
> 2017-10-11 17:59:13,633 INFO  session.SessionState 
> (SessionState.java:dropPathAndUnregisterDeleteOnExit(785)) - Deleted 
> directory: /tmp/hive/jenkins/72505700-e690-4355-bdd2-55db2188a976 on fs with 
> scheme hdfs
> 2017-10-11 17:59:13,635 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(297)) - Error occurred during processing of 
> message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> Invalid message format
>   at 
> 

[jira] [Resolved] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7523.
---
Resolution: Cannot Reproduce

This appears to be a test infra bug rather than a product bug. It hasn't 
happened for 6 months and we've had no movement on it, so let's close for now.

> Planner Test failing with "Failed to assign regions to servers after 6 
> millis."
> ---
>
> Key: IMPALA-7523
> URL: https://issues.apache.org/jira/browse/IMPALA-7523
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Philip Zeyliger
>Priority: Critical
>  Labels: broken-build, flaky
>
> I've seen 
> {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}}
>  fail with the following trace:
> {code}
> java.lang.IllegalStateException: Failed to assign regions to servers after 
> 6 millis.
>   at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153)
>   at 
> org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> I think we've seen it before as indicated in IMPALA-7061.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."

2019-06-27 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7523.
---
Resolution: Cannot Reproduce

This appears to be a test infra bug rather than a product bug. It hasn't 
happened for 6 months and we've had no movement on it, so let's close for now.

> Planner Test failing with "Failed to assign regions to servers after 6 
> millis."
> ---
>
> Key: IMPALA-7523
> URL: https://issues.apache.org/jira/browse/IMPALA-7523
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Philip Zeyliger
>Priority: Critical
>  Labels: broken-build, flaky
>
> I've seen 
> {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}}
>  fail with the following trace:
> {code}
> java.lang.IllegalStateException: Failed to assign regions to servers after 
> 6 millis.
>   at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153)
>   at 
> org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> I think we've seen it before as indicated in IMPALA-7061.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8719) SELECT from view fails with "AnalysisException: Column/field reference is ambiguous" after expression rewite

2019-06-27 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges updated IMPALA-8719:
-
Description: 
Minimal repro:

{code}
CREATE DATABASE tmp;
CREATE TABLE tmp.a_tbl (col1 BIGINT, col2 INT);
CREATE TABLE tmp.b_tbl (col1 BIGINT);

CREATE VIEW tmp.c_view
AS SELECT a.col1 as c1, a.col2 as c2
FROM tmp.a_tbl a INNER JOIN tmp.b_tbl b ON a.col1 = b.col1;

SELECT * FROM tmp.c_view WHERE ((c1 = 1 AND c2 = 2) OR (c1 = 11 AND c2 = 12)) 
AND c2 = 22;
{code}

This should return 0 rows, but instead the query fails with "ERROR: 
IllegalStateException: null" and the impalad log shows the following:

{code}
W0627 15:38:07.103969 31182 Expr.java:1201] a94d3fb39954fe29:ef89dd23] 
Not able to analyze after rewrite: org.apache.impala.common.AnalysisException: 
Column/field reference is ambiguous: 'col1' conjuncts: CompoundPredicate{op=OR, 
CompoundPredicate{op=AND, BoolLiteral{value=false} BinaryPredicate{op==, 
SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} NumericLiteral{value=1, 
type=TINYINT}}} CompoundPredicate{op=AND, BoolLiteral{value=false} 
BinaryPredicate{op==, SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} 
NumericLiteral{value=11, type=TINYINT BinaryPredicate{op==, 
SlotRef{label=a.col2, path=col2, type=INT, id=2} NumericLiteral{value=22, 
type=INT}, isInferred=true}
I0627 15:38:07.104437 31182 jni-util.cc:288] a94d3fb39954fe29:ef89dd23] 
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.impala.analysis.SlotRef.isBoundBySlotIds(SlotRef.java:222)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at 
org.apache.impala.planner.HdfsPartitionPruner.prunePartitions(HdfsPartitionPruner.java:124)
at 
org.apache.impala.planner.SingleNodePlanner.createHdfsScanPlan(SingleNodePlanner.java:1265)
at 
org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1391)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1578)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1096)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1589)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
at org.apache.impala.planner.Planner.createPlan(Planner.java:105)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1136)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1430)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1309)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1217)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1187)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:153)
I0627 15:38:07.302868 31182 status.cc:124] a94d3fb39954fe29:ef89dd23] 
IllegalStateException: null
@  0x1b0b92c  impala::Status::Status()
@  0x22c3ba4  impala::JniUtil::GetJniExceptionMsg()
@  0x2101acf  impala::JniCall::Call<>()
@  0x20fec9d  impala::JniUtil::CallJniMethod<>()
@  0x20fd0e6  impala::Frontend::GetExecRequest()
@  0x212acb2  impala::ImpalaServer::ExecuteInternal()
@  0x212a79a  impala::ImpalaServer::Execute()
@  0x21acf82  impala::ImpalaServer::query()
@  0x26fdcf1  beeswax::BeeswaxServiceProcessor::process_query()
@  0x26fda3f  beeswax::BeeswaxServiceProcessor::dispatchCall()
@  0x26cc05c  impala::ImpalaServiceProcessor::dispatchCall()
@  0x1abad1d  apache::thrift::TDispatchProcessor::process()
@  0x1f09cc9  
apache::thrift::server::TAcceptQueueServer::Task::run()
@  0x1f002f8  impala::ThriftThread::RunRunnable()
@  0x1f01a1e  boost::_mfi::mf2<>::operator()()
@  0x1f018b4  boost::_bi::list3<>::operator()<>()
@  

[jira] [Updated] (IMPALA-8719) SELECT from view fails with "AnalysisException: Column/field reference is ambiguous" after expression rewite

2019-06-27 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges updated IMPALA-8719:
-
Description: 
Minimal repro:

{code}
CREATE DATABASE tmp;
CREATE EXTERNAL TABLE tmp.a_tbl (col1 BIGINT, col2 INT);
CREATE EXTERNAL TABLE tmp.b_tbl (col1 BIGINT);

CREATE VIEW tmp.c_view
AS SELECT a.col1 as c1, a.col2 as c2
FROM tmp.a_tbl a INNER JOIN tmp.b_tbl b ON a.col1 = b.col1;

SELECT * FROM tmp.c_view WHERE ((c1 = 1 AND c2 = 2) OR (c1 = 11 AND c2 = 12)) 
AND c2 = 22;
{code}

This should return 0 rows, but instead the query fails with "ERROR: 
IllegalStateException: null" and the impalad log shows the following:

{code}
W0627 15:38:07.103969 31182 Expr.java:1201] a94d3fb39954fe29:ef89dd23] 
Not able to analyze after rewrite: org.apache.impala.common.AnalysisException: 
Column/field reference is ambiguous: 'col1' conjuncts: CompoundPredicate{op=OR, 
CompoundPredicate{op=AND, BoolLiteral{value=false} BinaryPredicate{op==, 
SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} NumericLiteral{value=1, 
type=TINYINT}}} CompoundPredicate{op=AND, BoolLiteral{value=false} 
BinaryPredicate{op==, SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} 
NumericLiteral{value=11, type=TINYINT BinaryPredicate{op==, 
SlotRef{label=a.col2, path=col2, type=INT, id=2} NumericLiteral{value=22, 
type=INT}, isInferred=true}
I0627 15:38:07.104437 31182 jni-util.cc:288] a94d3fb39954fe29:ef89dd23] 
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.impala.analysis.SlotRef.isBoundBySlotIds(SlotRef.java:222)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at 
org.apache.impala.planner.HdfsPartitionPruner.prunePartitions(HdfsPartitionPruner.java:124)
at 
org.apache.impala.planner.SingleNodePlanner.createHdfsScanPlan(SingleNodePlanner.java:1265)
at 
org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1391)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1578)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1096)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1589)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
at org.apache.impala.planner.Planner.createPlan(Planner.java:105)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1136)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1430)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1309)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1217)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1187)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:153)
I0627 15:38:07.302868 31182 status.cc:124] a94d3fb39954fe29:ef89dd23] 
IllegalStateException: null
@  0x1b0b92c  impala::Status::Status()
@  0x22c3ba4  impala::JniUtil::GetJniExceptionMsg()
@  0x2101acf  impala::JniCall::Call<>()
@  0x20fec9d  impala::JniUtil::CallJniMethod<>()
@  0x20fd0e6  impala::Frontend::GetExecRequest()
@  0x212acb2  impala::ImpalaServer::ExecuteInternal()
@  0x212a79a  impala::ImpalaServer::Execute()
@  0x21acf82  impala::ImpalaServer::query()
@  0x26fdcf1  beeswax::BeeswaxServiceProcessor::process_query()
@  0x26fda3f  beeswax::BeeswaxServiceProcessor::dispatchCall()
@  0x26cc05c  impala::ImpalaServiceProcessor::dispatchCall()
@  0x1abad1d  apache::thrift::TDispatchProcessor::process()
@  0x1f09cc9  
apache::thrift::server::TAcceptQueueServer::Task::run()
@  0x1f002f8  impala::ThriftThread::RunRunnable()
@  0x1f01a1e  boost::_mfi::mf2<>::operator()()
@  0x1f018b4  

[jira] [Created] (IMPALA-8719) SELECT from view fails with "AnalysisException: Column/field reference is ambiguous" after expression rewite

2019-06-27 Thread Attila Jeges (JIRA)
Attila Jeges created IMPALA-8719:


 Summary: SELECT from view fails with "AnalysisException: 
Column/field reference is ambiguous" after expression rewite
 Key: IMPALA-8719
 URL: https://issues.apache.org/jira/browse/IMPALA-8719
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: Attila Jeges


Minimal repro:

{code}
CREATE DATABASE tmp;
CREATE EXTERNAL TABLE tmp.a_tbl (col1 BIGINT, col2 INT);
CREATE EXTERNAL TABLE tmp.b_tbl (col1 BIGINT);

CREATE VIEW tmp.c_view
AS SELECT a.col1 as c1, a.col2 as c2
FROM tmp.a_tbl a INNER JOIN tmp.b_tbl b ON a.col1 = b.col1;

SELECT * FROM tmp.c_view WHERE ((c1 = 1 AND c2 = 2) OR (c1 = 11 AND c2 = 12)) 
AND c2 = 22;
{code}

We should get 0 rows, but instead the query fails with "ERROR: 
IllegalStateException: null" and in the impalad log we get the following:

{code}
W0627 15:38:07.103969 31182 Expr.java:1201] a94d3fb39954fe29:ef89dd23] 
Not able to analyze after rewrite: org.apache.impala.common.AnalysisException: 
Column/field reference is ambiguous: 'col1' conjuncts: CompoundPredicate{op=OR, 
CompoundPredicate{op=AND, BoolLiteral{value=false} BinaryPredicate{op==, 
SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} NumericLiteral{value=1, 
type=TINYINT}}} CompoundPredicate{op=AND, BoolLiteral{value=false} 
BinaryPredicate{op==, SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} 
NumericLiteral{value=11, type=TINYINT BinaryPredicate{op==, 
SlotRef{label=a.col2, path=col2, type=INT, id=2} NumericLiteral{value=22, 
type=INT}, isInferred=true}
I0627 15:38:07.104437 31182 jni-util.cc:288] a94d3fb39954fe29:ef89dd23] 
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.impala.analysis.SlotRef.isBoundBySlotIds(SlotRef.java:222)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at 
org.apache.impala.planner.HdfsPartitionPruner.prunePartitions(HdfsPartitionPruner.java:124)
at 
org.apache.impala.planner.SingleNodePlanner.createHdfsScanPlan(SingleNodePlanner.java:1265)
at 
org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1391)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1578)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1096)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1589)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
at org.apache.impala.planner.Planner.createPlan(Planner.java:105)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1136)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1430)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1309)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1217)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1187)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:153)
I0627 15:38:07.302868 31182 status.cc:124] a94d3fb39954fe29:ef89dd23] 
IllegalStateException: null
@  0x1b0b92c  impala::Status::Status()
@  0x22c3ba4  impala::JniUtil::GetJniExceptionMsg()
@  0x2101acf  impala::JniCall::Call<>()
@  0x20fec9d  impala::JniUtil::CallJniMethod<>()
@  0x20fd0e6  impala::Frontend::GetExecRequest()
@  0x212acb2  impala::ImpalaServer::ExecuteInternal()
@  0x212a79a  impala::ImpalaServer::Execute()
@  0x21acf82  impala::ImpalaServer::query()
@  0x26fdcf1  beeswax::BeeswaxServiceProcessor::process_query()
@  0x26fda3f  beeswax::BeeswaxServiceProcessor::dispatchCall()
@  0x26cc05c  impala::ImpalaServiceProcessor::dispatchCall()
@  0x1abad1d  apache::thrift::TDispatchProcessor::process()

[jira] [Created] (IMPALA-8719) SELECT from view fails with "AnalysisException: Column/field reference is ambiguous" after expression rewite

2019-06-27 Thread Attila Jeges (JIRA)
Attila Jeges created IMPALA-8719:


 Summary: SELECT from view fails with "AnalysisException: 
Column/field reference is ambiguous" after expression rewite
 Key: IMPALA-8719
 URL: https://issues.apache.org/jira/browse/IMPALA-8719
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: Attila Jeges


Minimal repro:

{code}
CREATE DATABASE tmp;
CREATE EXTERNAL TABLE tmp.a_tbl (col1 BIGINT, col2 INT);
CREATE EXTERNAL TABLE tmp.b_tbl (col1 BIGINT);

CREATE VIEW tmp.c_view
AS SELECT a.col1 as c1, a.col2 as c2
FROM tmp.a_tbl a INNER JOIN tmp.b_tbl b ON a.col1 = b.col1;

SELECT * FROM tmp.c_view WHERE ((c1 = 1 AND c2 = 2) OR (c1 = 11 AND c2 = 12)) 
AND c2 = 22;
{code}

We should get 0 rows, but instead the query fails with "ERROR: 
IllegalStateException: null" and in the impalad log we get the following:

{code}
W0627 15:38:07.103969 31182 Expr.java:1201] a94d3fb39954fe29:ef89dd23] 
Not able to analyze after rewrite: org.apache.impala.common.AnalysisException: 
Column/field reference is ambiguous: 'col1' conjuncts: CompoundPredicate{op=OR, 
CompoundPredicate{op=AND, BoolLiteral{value=false} BinaryPredicate{op==, 
SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} NumericLiteral{value=1, 
type=TINYINT}}} CompoundPredicate{op=AND, BoolLiteral{value=false} 
BinaryPredicate{op==, SlotRef{label=a.col1, path=col1, type=BIGINT, id=0} 
NumericLiteral{value=11, type=TINYINT BinaryPredicate{op==, 
SlotRef{label=a.col2, path=col2, type=INT, id=2} NumericLiteral{value=22, 
type=INT}, isInferred=true}
I0627 15:38:07.104437 31182 jni-util.cc:288] a94d3fb39954fe29:ef89dd23] 
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.impala.analysis.SlotRef.isBoundBySlotIds(SlotRef.java:222)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at org.apache.impala.analysis.Expr.isBoundBySlotIds(Expr.java:1230)
at 
org.apache.impala.planner.HdfsPartitionPruner.prunePartitions(HdfsPartitionPruner.java:124)
at 
org.apache.impala.planner.SingleNodePlanner.createHdfsScanPlan(SingleNodePlanner.java:1265)
at 
org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1391)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1578)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1096)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1589)
at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:822)
at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:658)
at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
at 
org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
at org.apache.impala.planner.Planner.createPlan(Planner.java:105)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1136)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1430)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1309)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1217)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1187)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:153)
I0627 15:38:07.302868 31182 status.cc:124] a94d3fb39954fe29:ef89dd23] 
IllegalStateException: null
@  0x1b0b92c  impala::Status::Status()
@  0x22c3ba4  impala::JniUtil::GetJniExceptionMsg()
@  0x2101acf  impala::JniCall::Call<>()
@  0x20fec9d  impala::JniUtil::CallJniMethod<>()
@  0x20fd0e6  impala::Frontend::GetExecRequest()
@  0x212acb2  impala::ImpalaServer::ExecuteInternal()
@  0x212a79a  impala::ImpalaServer::Execute()
@  0x21acf82  impala::ImpalaServer::query()
@  0x26fdcf1  beeswax::BeeswaxServiceProcessor::process_query()
@  0x26fda3f  beeswax::BeeswaxServiceProcessor::dispatchCall()
@  0x26cc05c  impala::ImpalaServiceProcessor::dispatchCall()
@  0x1abad1d  apache::thrift::TDispatchProcessor::process()

[jira] [Assigned] (IMPALA-8718) Incorrect AnalysisException with outer join complex type column

2019-06-27 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned IMPALA-8718:


Assignee: Yongzhi Chen

> Incorrect AnalysisException with outer join complex type column
> ---
>
> Key: IMPALA-8718
> URL: https://issues.apache.org/jira/browse/IMPALA-8718
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.10.0
>Reporter: Tamas Mate
>Assignee: Yongzhi Chen
>Priority: Major
>
> Although user is not explicitly specifying {{IS}} {{NOT NULL}} predicate the 
> query fails with:
> {code:java}
> org.apache.impala.common.AnalysisException: IS NOT NULL predicate does not 
> support complex types: col3 IS NOT NULL.
> {code}
> When a complex type is on the right hand side of the join it is 
> [wrapped|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1114]
>  by IsNullPredicate, as it could be null at the end of the join. Which is 
> caught by this 
> [condition|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java#L124]
>  later. The following exception is thrown:
> {code:java}
> I0620 04:11:29.498865 474227 jni-util.cc:211] 
> java.lang.IllegalStateException: org.apache.impala.common.AnalysisException: 
> IS NOT NULL predicate does not support complex types: col3 IS NOT NULL
>   at org.apache.impala.analysis.Expr.analyzeNoThrow(Expr.java:362)
>   at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:158)
>   at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:133)
>   at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:122)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1042)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1454)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:778)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:616)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:259)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:149)
>   at org.apache.impala.planner.Planner.createPlan(Planner.java:98)
>   at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1005)
>   at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1101)
>   at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
> Caused by: org.apache.impala.common.AnalysisException: IS NOT NULL predicate 
> does not support complex types: col3 IS NOT NULL
>   at 
> org.apache.impala.analysis.IsNullPredicate.analyzeImpl(IsNullPredicate.java:127)
>   at org.apache.impala.analysis.Expr.analyze(Expr.java:343)
>   at org.apache.impala.analysis.Expr.analyzeNoThrow(Expr.java:360)
>   ... 13 more
> {code}
> I believe the nullable is necessary, but the error message for this condition 
> is incorrect. The issue can be reproduced with the following queries.
> {code:java}
> create table sample_test_1 
> (col1 string,
> col2 string,
> col3 array>);
> create table sample_test_2 
> (col1 string,
> col2 string);
> with leftSide as
> (
> select col1
>   from sample_test_2
> ),
> rightSide as
> (
> select t.col1,
>rank() over(order by t.col1) as rnk
>   from sample_test_1 t
>   left outer join t.col3
> )
> select *
>   from leftSide l
>left join rightSide r
>on l.col1 = r.col1
> {code}
> cc.: [~ychena]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8718) Incorrect AnalysisException with outer join complex type column

2019-06-27 Thread Tamas Mate (JIRA)
Tamas Mate created IMPALA-8718:
--

 Summary: Incorrect AnalysisException with outer join complex type 
column
 Key: IMPALA-8718
 URL: https://issues.apache.org/jira/browse/IMPALA-8718
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 2.10.0
Reporter: Tamas Mate


Although user is not explicitly specifying {{IS}} {{NOT NULL}} predicate the 
query fails with:
{code:java}
org.apache.impala.common.AnalysisException: IS NOT NULL predicate does not 
support complex types: col3 IS NOT NULL.
{code}
When a complex type is on the right hand side of the join it is 
[wrapped|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1114]
 by IsNullPredicate, as it could be null at the end of the join. Which is 
caught by this 
[condition|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java#L124]
 later. The following exception is thrown:
{code:java}
I0620 04:11:29.498865 474227 jni-util.cc:211] java.lang.IllegalStateException: 
org.apache.impala.common.AnalysisException: IS NOT NULL predicate does not 
support complex types: col3 IS NOT NULL
  at org.apache.impala.analysis.Expr.analyzeNoThrow(Expr.java:362)
  at 
org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:158)
  at 
org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:133)
  at 
org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:122)
  at 
org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1042)
  at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1454)
  at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:778)
  at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:616)
  at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:259)
  at 
org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:149)
  at org.apache.impala.planner.Planner.createPlan(Planner.java:98)
  at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1005)
  at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1101)
  at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
Caused by: org.apache.impala.common.AnalysisException: IS NOT NULL predicate 
does not support complex types: col3 IS NOT NULL
  at 
org.apache.impala.analysis.IsNullPredicate.analyzeImpl(IsNullPredicate.java:127)
  at org.apache.impala.analysis.Expr.analyze(Expr.java:343)
  at org.apache.impala.analysis.Expr.analyzeNoThrow(Expr.java:360)
  ... 13 more
{code}
I believe the nullable is necessary, but the error message for this condition 
is incorrect. The issue can be reproduced with the following queries.
{code:java}
create table sample_test_1 
(col1 string,
col2 string,
col3 array>);

create table sample_test_2 
(col1 string,
col2 string);

with leftSide as
(
select col1
  from sample_test_2
),
rightSide as
(
select t.col1,
   rank() over(order by t.col1) as rnk
  from sample_test_1 t
  left outer join t.col3
)
select *
  from leftSide l
   left join rightSide r
   on l.col1 = r.col1
{code}
cc.: [~ychena]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8718) Incorrect AnalysisException with outer join complex type column

2019-06-27 Thread Tamas Mate (JIRA)
Tamas Mate created IMPALA-8718:
--

 Summary: Incorrect AnalysisException with outer join complex type 
column
 Key: IMPALA-8718
 URL: https://issues.apache.org/jira/browse/IMPALA-8718
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 2.10.0
Reporter: Tamas Mate


Although user is not explicitly specifying {{IS}} {{NOT NULL}} predicate the 
query fails with:
{code:java}
org.apache.impala.common.AnalysisException: IS NOT NULL predicate does not 
support complex types: col3 IS NOT NULL.
{code}
When a complex type is on the right hand side of the join it is 
[wrapped|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1114]
 by IsNullPredicate, as it could be null at the end of the join. Which is 
caught by this 
[condition|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java#L124]
 later. The following exception is thrown:
{code:java}
I0620 04:11:29.498865 474227 jni-util.cc:211] java.lang.IllegalStateException: 
org.apache.impala.common.AnalysisException: IS NOT NULL predicate does not 
support complex types: col3 IS NOT NULL
  at org.apache.impala.analysis.Expr.analyzeNoThrow(Expr.java:362)
  at 
org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:158)
  at 
org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:133)
  at 
org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:122)
  at 
org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1042)
  at 
org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1454)
  at 
org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:778)
  at 
org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:616)
  at 
org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:259)
  at 
org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:149)
  at org.apache.impala.planner.Planner.createPlan(Planner.java:98)
  at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1005)
  at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1101)
  at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
Caused by: org.apache.impala.common.AnalysisException: IS NOT NULL predicate 
does not support complex types: col3 IS NOT NULL
  at 
org.apache.impala.analysis.IsNullPredicate.analyzeImpl(IsNullPredicate.java:127)
  at org.apache.impala.analysis.Expr.analyze(Expr.java:343)
  at org.apache.impala.analysis.Expr.analyzeNoThrow(Expr.java:360)
  ... 13 more
{code}
I believe the nullable is necessary, but the error message for this condition 
is incorrect. The issue can be reproduced with the following queries.
{code:java}
create table sample_test_1 
(col1 string,
col2 string,
col3 array>);

create table sample_test_2 
(col1 string,
col2 string);

with leftSide as
(
select col1
  from sample_test_2
),
rightSide as
(
select t.col1,
   rank() over(order by t.col1) as rnk
  from sample_test_1 t
  left outer join t.col3
)
select *
  from leftSide l
   left join rightSide r
   on l.col1 = r.col1
{code}
cc.: [~ychena]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5031) UBSAN clean and method for testing UBSAN cleanliness

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874042#comment-16874042
 ] 

ASF subversion and git services commented on IMPALA-5031:
-

Commit 03bf2d25663dfa09133c466709a485e03c544236 in impala's branch 
refs/heads/master from Jim Apple
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=03bf2d2 ]

IMPALA-5031: link fesupport so FE tests run with UBSAN

This commit enables the frontend and JDBC tests to run under
UBSAN. Before this comit, they fail with an error message about
finding the UBSAN support functions like
__ubsan_handle_load_invalid_value. Linking those functions into
libfesupport.so requires using the linker flag --whole-archive so that
these symbols, which are not directly referenced by the source code
used to build libfesupport.so, are included.

This patch also enables the custom cluster tests to run under UBSAN;
they recently gained frontend tests in commit
d72f3330c1edc9086ba120e6d3469a75c0aea083.

Change-Id: I42049fb3e2de83aee0d0e00e2703788afde739e2
Reviewed-on: http://gerrit.cloudera.org:8080/13710
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> UBSAN clean and method for testing UBSAN cleanliness
> 
>
> Key: IMPALA-5031
> URL: https://issues.apache.org/jira/browse/IMPALA-5031
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Affects Versions: Impala 2.9.0
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Minor
>
> http://releases.llvm.org/3.8.0/tools/clang/docs/UndefinedBehaviorSanitizer.html
>  builds are supported after https://gerrit.cloudera.org/#/c/6186/, but 
> Impala's test suite triggers many errors under UBSAN. Those errors should be 
> fixed and then there should be a way to run the test suite under UBSAN and 
> fail if there were any errors detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8702) PlannerTest.testJoins flakiness due to reliance on an exact cardinality estimate

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874037#comment-16874037
 ] 

ASF subversion and git services commented on IMPALA-8702:
-

Commit c9937fd99a3a0f52124797374da76a9d2d25f650 in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c9937fd ]

IMPALA-8702: Remove the PlannerTestOption of VALIDATE_CARDINALITY to avoid a 
flaky test

Removed the PlannerTestOption of VALIDATE_CARDINALITY in testJoins()
and testFkPkJoinDetection() to avoid checking the estimated cardinality which
may be slightly different each time due to IMPALA-7608 when an hdfs table
without stats is involved in a query.

Change-Id: Ie7fce59ecef45df7edc71cde1f8166ccfd45d187
Reviewed-on: http://gerrit.cloudera.org:8080/13717
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> PlannerTest.testJoins flakiness due to reliance on an exact cardinality 
> estimate
> 
>
> Key: IMPALA-8702
> URL: https://issues.apache.org/jira/browse/IMPALA-8702
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Bikramjeet Vig
>Assignee: Fang-Yu Rao
>Priority: Critical
>  Labels: broken-build
>
> The following test case is failing in PlannerTest.testJoins.
>  
> {noformat}
> Section PLAN of query:
> select * from functional_text_lzo.emptytable a inner join
> functional_text_lzo.alltypes b on a.f2 = b.int_col
> Actual does not match expected result:
> PLAN-ROOT SINK
> |
> 02:HASH JOIN [INNER JOIN]
> |  hash predicates: b.int_col = a.f2
> |  runtime filters: RF000 <- a.f2
> |  row-size=96B cardinality=5.66K
> ^
> |
> |--00:SCAN HDFS [functional_text_lzo.emptytable a]
> | partitions=0/0 files=0 size=0B
> | row-size=16B cardinality=0
> |
> 01:SCAN HDFS [functional_text_lzo.alltypes b]
>HDFS partitions=24/24 files=24 size=123.54KB
>runtime filters: RF000 -> b.int_col
>row-size=80B cardinality=5.66K
> Expected:
> PLAN-ROOT SINK
> |
> 02:HASH JOIN [INNER JOIN]
> |  hash predicates: b.int_col = a.f2
> |  runtime filters: RF000 <- a.f2
> |  row-size=96B cardinality=5.65K
> |
> |--00:SCAN HDFS [functional_text_lzo.emptytable a]
> | partitions=0/0 files=0 size=0B
> | row-size=16B cardinality=0
> |
> 01:SCAN HDFS [functional_text_lzo.alltypes b]
>HDFS partitions=24/24 files=24 size=123.32KB
>runtime filters: RF000 -> b.int_col
>row-size=80B cardinality=5.65K
> Verbose plan:
> F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> Per-Host Resources: mem-estimate=130.94MB mem-reservation=2.95MB 
> thread-reservation=2 runtime-filters-memory=1.00MB
>   PLAN-ROOT SINK
>   |  mem-estimate=0B mem-reservation=0B thread-reservation=0
>   |
>   02:HASH JOIN [INNER JOIN]
>   |  hash predicates: b.int_col = a.f2
>   |  fk/pk conjuncts: assumed fk/pk
>   |  runtime filters: RF000[bloom] <- a.f2
>   |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0
>   |  tuple-ids=1,0 row-size=96B cardinality=5.66K
>   |  in pipelines: 01(GETNEXT), 00(OPEN)
>   |
>   |--00:SCAN HDFS [functional_text_lzo.emptytable a]
>   | partitions=0/0 files=0 size=0B
>   | stored statistics:
>   |   table: rows=unavailable size=unavailable
>   |   partitions: 0/0 rows=unavailable
>   |   columns missing stats: field
>   | extrapolated-rows=disabled max-scan-range-rows=0
>   | mem-estimate=0B mem-reservation=0B thread-reservation=0
>   | tuple-ids=0 row-size=16B cardinality=0
>   | in pipelines: 00(GETNEXT)
>   |
>   01:SCAN HDFS [functional_text_lzo.alltypes b]
>  HDFS partitions=24/24 files=24 size=123.54KB
>  runtime filters: RF000[bloom] -> b.int_col
>  stored statistics:
>table: rows=unavailable size=unavailable
>partitions: 0/24 rows=unavailable
>columns missing stats: int_col, id, bool_col, tinyint_col, 
> smallint_col, bigint_col, float_col, double_col, date_string_col, string_col, 
> timestamp_col
>  extrapolated-rows=disabled max-scan-range-rows=unavailable
>  mem-estimate=128.00MB mem-reservation=8.00KB thread-reservation=1
>  tuple-ids=1 row-size=80B cardinality=5.66K
>  in pipelines: 01(GETNEXT)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8341) Data cache for remote reads

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874036#comment-16874036
 ] 

ASF subversion and git services commented on IMPALA-8341:
-

Commit e29b387ea10739e78075bac8170e45722d4b9940 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e29b387 ]

IMPALA-8341: [DOCS] Describe the setting for remote data caching

Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f
Reviewed-on: http://gerrit.cloudera.org:8080/13724
Tested-by: Impala Public Jenkins 
Reviewed-by: Michael Ho 


> Data cache for remote reads
> ---
>
> Key: IMPALA-8341
> URL: https://issues.apache.org/jira/browse/IMPALA-8341
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud 
> settings (e.g. data stored in object store), the computation and storage are 
> no longer co-located. This breaks the typical pattern in which Impala query 
> fragment instances are scheduled at where the data is located. In this 
> setting, the network bandwidth requirement of both the nics and the top of 
> rack switches will go up quite a lot as the network traffic includes the data 
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache 
> at the compute nodes to cache the working set. With deterministic scan range 
> scheduling, each compute node should hold non-overlapping partitions of the 
> data set. 
> An initial prototype of the cache was posted here: 
> [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a 
> better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. 
> not holding the lock while doing IO).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8476) Replace Sentry admin check workaround with proper Sentry API

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874039#comment-16874039
 ] 

ASF subversion and git services commented on IMPALA-8476:
-

Commit ee2b3b9bfaf7950a8fb3f87ea44c29d0f6a13be4 in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ee2b3b9 ]

IMPALA-8476: Replace Sentry admin check workaround with proper Sentry API

Impala uses a workaround to detect if a user is a Sentry admin by calling
the Sentry API to list privileges associated with the user since previously
Sentry did not provide a suitable API to peform this check.
This patch invokes a new API in SENTRY-2440 to perform the Sentry admin check.

Also modified test_sentry.py to exercise the code paths corresponding to the
following 3 different types of users: (i) a Sentry admin, (ii) an existing
user which is not a Sentry admin, and (iii) a non-existing user.

Testing:
1. Passed the tests in the revised test_sentry.py.

Change-Id: I5a27140d401494bc372ad0da96ada57bda94cd82
Reviewed-on: http://gerrit.cloudera.org:8080/13346
Reviewed-by: Fredy Wijaya 
Tested-by: Impala Public Jenkins 


> Replace Sentry admin check workaround with proper Sentry API
> 
>
> Key: IMPALA-8476
> URL: https://issues.apache.org/jira/browse/IMPALA-8476
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Fredy Wijaya
>Assignee: Fang-Yu Rao
>Priority: Minor
>  Labels: newbie
> Fix For: Impala 3.3.0
>
>
> Impala uses a workaround to detect if a user is a Sentry admin by calling 
> list privileges API before Sentry didn't provide an API to tell if user is a 
> Sentry admin: 
> https://github.com/apache/impala/blob/d820952d86d34ba887c55a09e58b735cbef866c2/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L378-L393
> https://issues.apache.org/jira/browse/SENTRY-2440 supports a new API. We 
> should update the Impala to use the new proper Sentry API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7608) Estimate row count from file size when no stats available

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874038#comment-16874038
 ] 

ASF subversion and git services commented on IMPALA-7608:
-

Commit c9937fd99a3a0f52124797374da76a9d2d25f650 in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c9937fd ]

IMPALA-8702: Remove the PlannerTestOption of VALIDATE_CARDINALITY to avoid a 
flaky test

Removed the PlannerTestOption of VALIDATE_CARDINALITY in testJoins()
and testFkPkJoinDetection() to avoid checking the estimated cardinality which
may be slightly different each time due to IMPALA-7608 when an hdfs table
without stats is involved in a query.

Change-Id: Ie7fce59ecef45df7edc71cde1f8166ccfd45d187
Reviewed-on: http://gerrit.cloudera.org:8080/13717
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Estimate row count from file size when no stats available
> -
>
> Key: IMPALA-7608
> URL: https://issues.apache.org/jira/browse/IMPALA-7608
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Impala makes heavy use of stats, which is a good thing. Stats feed into query 
> planning where they allow the planner to choose among a fixed set of 
> alternatives such as: do I put t1 on the build or probe side of a join?
> Because the planner decisions tend to be discrete, we only need enough 
> information to decide whether to do A or B (or, more generally, to choose 
> among a set of choices A, B, C, ... N).
> Often data sizes are vastly different on different paths. Stats help refine 
> these numbers, but much of the information just needs to be in the ball park: 
> is table t1 larger or smaller than t2? Often, one table is much larger than 
> the other, so even a rough size estimate will force the right decision (put 
> the smaller table on the build side of a join.)
> Today, if Impala has no stats, it refuses to even consider table size. 
> Consider the following unit test:
> {noformat}
> runTest("SELECT a FROM functional.tinytable;", -1);
> {noformat}
> This plans the given query, then verifies that the expected result 
> cardinality is the number given. In this case, {{tinytable}} has no stats. 
> So, we don't know the cardinality. OK...
> The table turns out to be 3 rows. Perhaps I join this to a hypothetical 
> {{hugetable}} of 1 million rows. Without even a guess at cardinality, Impala 
> can't choose a good plan.
> The suggestion is to use table size to estimate row cardinality. Come up with 
> some assumed row width, say 100. Then, estimate row count as {{file size / 
> est. row width}}. This gives a ballpark number that would be plenty good for 
> the planner to choose the proper plan much of the time. 
> Since this is such an easy estimate to make, and will address the occasional 
> case in which stats are not available, it seems a shame to not take advantage 
> of this information.
> In terms of implementation, {{HdfsScanNode.computeCardinalities()}} already 
> uses some extrapolation, if enabled. It can be extended to do the last-ditch 
> extrapolation suggested above if, after the current techniques, the 
> cardinality is still undefined.
> If we apply this simple fix in a prototype build, the new test result is 
> closer to reality:
> {noformat}
> runTest("SELECT a FROM functional.tinytable;", 1);
> {noformat}
> Given that the fix is so simple, any reason not to use the file size, when 
> available? Is 100 a reasonable assumed row width? Should this functionality 
> always be on, not just when enabled using the back-end config?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8585) Impala ACID tests

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874040#comment-16874040
 ] 

ASF subversion and git services commented on IMPALA-8585:
-

Commit 524f97136a17bb87cc3605371b1e600739f1e489 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=524f971 ]

IMPALA-8585: Fix upgraded table related tests in AcidUtilsTest

The following changes were ok on their own, but together led to
broken tests on Hive 3:
https://gerrit.cloudera.org/#/c/13428/
https://gerrit.cloudera.org/#/c/13427/

Change-Id: Iad41c1a7b0458a405b95f62d8e0ecd93947f492d
Reviewed-on: http://gerrit.cloudera.org:8080/13730
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Impala ACID tests
> -
>
> Key: IMPALA-8585
> URL: https://issues.apache.org/jira/browse/IMPALA-8585
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>
> Umbrella Jira for adding tests about ACID functionality, e.g.:
>  * Ordinary table that was upgraded to ACID table
>  * Inserting data in hive and querying it in Impala concurrently
>  * Compute stats interoperability between Hive and Impala
>  * Partitioned tables, dynamic partitioning



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8369) Impala should be able to interoperate with Hive 3.1.0

2019-06-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874041#comment-16874041
 ] 

ASF subversion and git services commented on IMPALA-8369:
-

Commit 4d0578dc812ad51edc589a9d7dd1fd23a041370b in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4d0578d ]

IMPALA-8369: Bump CDP_BUILD_NUMBER and re-enable test_max_nesting_depth

Switch to a newer version of CDP Hive where HIVE-20833 is reverted.
HIVE-20833 was backported without HIVE-20221, which broke the
the handling of column PARTITION_PARAMS.PARAM_KEY in HMS, leading
to several test failures in Impala.

The new HIVE version also includes the fix for HIVE-21796, so
test_max_nesting_depth could be also re-enabled with Hive 3.

Change-Id: I1d6f4e29997c9cf2238e1d614f8d1ed7d35ffe92
Reviewed-on: http://gerrit.cloudera.org:8080/13723
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Impala should be able to interoperate with Hive 3.1.0
> -
>
> Key: IMPALA-8369
> URL: https://issues.apache.org/jira/browse/IMPALA-8369
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: impala-acid
>
> Currently, Impala only works with Hive 2.1.1. Since Hive 3.1.0 has been 
> released for a while it would be good to add support for Hive 3.1.0 (HMS 
> 3.1.0). This patch will focus on ability to connect to HMS 3.1.0 and run 
> existing tests. It will not focus on adding support for newer features like 
> ACID in Hive 3.1.0 which can be taken up as separate JIRA.
> It would be good to make changes to Impala source code such that it can work 
> with both Hive 2.1.0 and Hive 3.1.0 without the need to create a separate 
> branch. However, this should be a aspirational goal. If we hit a blocker we 
> should investigate alternative approaches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8669) Support giving a terminal commit in compare_branches.py

2019-06-27 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-8669.

Resolution: Fixed

> Support giving a terminal commit in compare_branches.py
> ---
>
> Key: IMPALA-8669
> URL: https://issues.apache.org/jira/browse/IMPALA-8669
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Currently, bin/compare_branches.py (used in cherrypick-2.x-and-test Jenkins 
> job) will cherry-pick as more commits as possible. However, a clean pick from 
> the master branch does not mean it can always pass the tests. If we find such 
> a problematic commit, we'd like to let the cherrypick-2.x-and-test Jenkins 
> job cherry-pick till the commit before it. And then submit only a patch for 
> the problematic commit for review.
> For example, currently the HEAD of the 2.x branch is 
> e6c1eb85eb31264eaf5ed92782bf181225ce9581. bin/compare_branches.py will pick 
> the following commits of the master branch:
> |200870275|IMPALA-7314: Doc generation should fail on error| |
> |940d536f2|IMPALA-7252: Backport rate limiting of fadvise calls into 
> toolchain glog| |
> |c7d2c2ec7|IMPALA-7298: Stop passing IP address as hostname in Kerberos 
> principal| |
> |70e2d57fc|IMPALA-7315: fix test_update_with_clear_entries_flag race| |
> |1d491d648|IMPALA-7259: Improve Impala shell performance| |
> |2a40e8f2a|test_recover_partitions.py had asserts that were always true.| |
> |649397e37|KUDU-2492: Make the use of SO_REUSEPORT conditional on it being 
> defined| |
> |eaea7d289|IMPALA-6677: [DOCS] Document the next_day function| |
> |79eef4924|IMPALA-7256: Aggregator mem usage isn't reflected in summary| |
> |42809cbd1|IMPALA-7173: [DOCS] Added check options in the load balancer 
> examples| |
> |cb0f8a0ad|IMPALA-3675: part 1: -Werror for ASAN| |
> |f7efba236|IMPALA-5031: Fix undefined behavior: memset NULL| |
> |514dbb79a|IMPALA-6299: Use LlvmCodeGen's internal list of white-listed CPU 
> attributes for handcrafting IRs| |
> |f9e7d9385|IMPALA-7291: [DOCS] Note about no codegen support for CHAR| |
> |ac4acf1b7|IMPALA-3040: Remove cache directives if a partition is dropped 
> externally| |
> |02389d4dd|IMPALA-6174: [DOCS] Fixed the seed data type for RAND and RANDOM 
> functions| |
> |21d0c06a4|IMPALA-5826 IMPALA-7162: [DOCS] Documented the 
> IDLE_SESSION_TIMEOUT query option| |
> |b76207c59|IMPALA-7330. After LOAD DATA, only refresh affected partition| |
> |cdc8b9ba7|IMPALA-5031: Fix undefined behavior: memset NULL| |
> |def5c881b|IMPALA-7218: [DOCS] Support column list in ALTER VIEW| |
> |c333b5526|IMPALA-7257. Support Kudu tables in LocalCatalog|<-- Clean picked 
> but introduced compile errors.|
> |ba8138694|IMPALA-7276. Support CREATE TABLE AS SELECT with LocalCatalog| |
> c333b5526 is a clean pick but it will introduce a compile error in FE:
> {code:java}
> [ERROR] 
> /home/ubuntu/Impala/fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java:[193,68]
>  cannot find symbol
> [ERROR] symbol:   method startsWith(java.lang.String)
> [ERROR] location: class org.hamcrest.CoreMatchers{code}
> We hope the cherrypick-2.x-and-test Jenkins job can pick till def5c881b (the 
> previous commit) and merge these good commits. Then we can just submit one 
> patch to fix the pick of c333b5526.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8669) Support giving a terminal commit in compare_branches.py

2019-06-27 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-8669.

Resolution: Fixed

> Support giving a terminal commit in compare_branches.py
> ---
>
> Key: IMPALA-8669
> URL: https://issues.apache.org/jira/browse/IMPALA-8669
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Currently, bin/compare_branches.py (used in cherrypick-2.x-and-test Jenkins 
> job) will cherry-pick as more commits as possible. However, a clean pick from 
> the master branch does not mean it can always pass the tests. If we find such 
> a problematic commit, we'd like to let the cherrypick-2.x-and-test Jenkins 
> job cherry-pick till the commit before it. And then submit only a patch for 
> the problematic commit for review.
> For example, currently the HEAD of the 2.x branch is 
> e6c1eb85eb31264eaf5ed92782bf181225ce9581. bin/compare_branches.py will pick 
> the following commits of the master branch:
> |200870275|IMPALA-7314: Doc generation should fail on error| |
> |940d536f2|IMPALA-7252: Backport rate limiting of fadvise calls into 
> toolchain glog| |
> |c7d2c2ec7|IMPALA-7298: Stop passing IP address as hostname in Kerberos 
> principal| |
> |70e2d57fc|IMPALA-7315: fix test_update_with_clear_entries_flag race| |
> |1d491d648|IMPALA-7259: Improve Impala shell performance| |
> |2a40e8f2a|test_recover_partitions.py had asserts that were always true.| |
> |649397e37|KUDU-2492: Make the use of SO_REUSEPORT conditional on it being 
> defined| |
> |eaea7d289|IMPALA-6677: [DOCS] Document the next_day function| |
> |79eef4924|IMPALA-7256: Aggregator mem usage isn't reflected in summary| |
> |42809cbd1|IMPALA-7173: [DOCS] Added check options in the load balancer 
> examples| |
> |cb0f8a0ad|IMPALA-3675: part 1: -Werror for ASAN| |
> |f7efba236|IMPALA-5031: Fix undefined behavior: memset NULL| |
> |514dbb79a|IMPALA-6299: Use LlvmCodeGen's internal list of white-listed CPU 
> attributes for handcrafting IRs| |
> |f9e7d9385|IMPALA-7291: [DOCS] Note about no codegen support for CHAR| |
> |ac4acf1b7|IMPALA-3040: Remove cache directives if a partition is dropped 
> externally| |
> |02389d4dd|IMPALA-6174: [DOCS] Fixed the seed data type for RAND and RANDOM 
> functions| |
> |21d0c06a4|IMPALA-5826 IMPALA-7162: [DOCS] Documented the 
> IDLE_SESSION_TIMEOUT query option| |
> |b76207c59|IMPALA-7330. After LOAD DATA, only refresh affected partition| |
> |cdc8b9ba7|IMPALA-5031: Fix undefined behavior: memset NULL| |
> |def5c881b|IMPALA-7218: [DOCS] Support column list in ALTER VIEW| |
> |c333b5526|IMPALA-7257. Support Kudu tables in LocalCatalog|<-- Clean picked 
> but introduced compile errors.|
> |ba8138694|IMPALA-7276. Support CREATE TABLE AS SELECT with LocalCatalog| |
> c333b5526 is a clean pick but it will introduce a compile error in FE:
> {code:java}
> [ERROR] 
> /home/ubuntu/Impala/fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java:[193,68]
>  cannot find symbol
> [ERROR] symbol:   method startsWith(java.lang.String)
> [ERROR] location: class org.hamcrest.CoreMatchers{code}
> We hope the cherrypick-2.x-and-test Jenkins job can pick till def5c881b (the 
> previous commit) and merge these good commits. Then we can just submit one 
> patch to fix the pick of c333b5526.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-8715) Test failures when catalogv2 is enabled

2019-06-27 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8715:

Labels: catalog-v2  (was: )

> Test failures when catalogv2 is enabled
> ---
>
> Key: IMPALA-8715
> URL: https://issues.apache.org/jira/browse/IMPALA-8715
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: catalog-v2
>
> metadata.test_compute_stats.TestComputeStats.test_compute_stats_compression_codec[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none]
> metadata.test_refresh_partition.TestRefreshPartition.test_refresh_partition_num_rows[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none]
> query_test.test_observability.TestObservability.test_query_profile_storage_load_time_filesystem
> query_test.test_observability.TestObservability.test_query_profile_storage_load_time
> When catalogv2 was enabled in IMPALA-8627 we see the above failures. It is 
> likely that {{TestObservability}} is failing due to IMPALA-7322 which was 
> merged around the same time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8717) impala-shell support for HiveServer2 HTTP endpoint

2019-06-27 Thread bharath v (JIRA)
bharath v created IMPALA-8717:
-

 Summary: impala-shell support for HiveServer2 HTTP endpoint
 Key: IMPALA-8717
 URL: https://issues.apache.org/jira/browse/IMPALA-8717
 Project: IMPALA
  Issue Type: Sub-task
  Components: Clients
Affects Versions: Impala 3.3.0
Reporter: bharath v
Assignee: bharath v


Having impala-shell support to connect to the HTTP HS2 endpoints should be 
super helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8717) impala-shell support for HiveServer2 HTTP endpoint

2019-06-27 Thread bharath v (JIRA)
bharath v created IMPALA-8717:
-

 Summary: impala-shell support for HiveServer2 HTTP endpoint
 Key: IMPALA-8717
 URL: https://issues.apache.org/jira/browse/IMPALA-8717
 Project: IMPALA
  Issue Type: Sub-task
  Components: Clients
Affects Versions: Impala 3.3.0
Reporter: bharath v
Assignee: bharath v


Having impala-shell support to connect to the HTTP HS2 endpoints should be 
super helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-8648) Impala ACID read stress tests

2019-06-27 Thread Dinesh Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Garg reassigned IMPALA-8648:
---

Assignee: Csaba Ringhofer  (was: Todd Lipcon)

> Impala ACID read stress tests
> -
>
> Key: IMPALA-8648
> URL: https://issues.apache.org/jira/browse/IMPALA-8648
> Project: IMPALA
>  Issue Type: Test
>Reporter: Dinesh Garg
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: impala-acid
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org