[jira] [Comment Edited] (IMPALA-11674) Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0

2022-10-19 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620716#comment-17620716
 ] 

Wenzhe Zhou edited comment on IMPALA-11674 at 10/20/22 5:10 AM:


Non-interactive impala-shell commands cannot reuse connection. How about run 
queries with jdbc, or run multiple queries from impala-shell with a query file?


was (Author: wzhou):
Non-interactive impala-shell commands cannot reuse connection. How about run 
queries with jdbc?

> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -
>
> Key: IMPALA-11674
> URL: https://issues.apache.org/jira/browse/IMPALA-11674
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Wenzhe Zhou
>Assignee: Riza Suminto
>Priority: Major
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384 
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0. 
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in 
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(), 
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The 
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0 
> and 0.16.0 to throw different exception for timeout. This cause 
> IsPeekTimeoutTException() and IsReadTimeoutTException() to return wrong value 
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to 
> rethrow the exception to caller TAcceptQueueServer::run() and make 
> TAcceptQueueServer::run() to close the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Qihong Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620713#comment-17620713
 ] 

Qihong Jiang edited comment on IMPALA-11677 at 10/20/22 5:05 AM:
-

Hello !, [~csringhofer] I'm only using non-transactional tables right now and 
it's equally slow. I tried using the Bulk API last week, but the improvement 
was very small. Then I referenced the code in impala3 and modified it to be an 
asynchronous call. The execution speed is greatly improved, but I don't know if 
there is any risk. 
{code:java}
public static List fireInsertEvents(MetaStoreClient msClient,
     TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
    if (!insertEventInfo.isTransactional()) {
      LOG.info("fire the insert events asynchronously.");
      ExecutorService fireInsertEventThread = 
Executors.newSingleThreadExecutor();
      CompletableFuture.runAsync(() -> {
        try {
          fireInsertEventHelper(msClient.getHiveClient(),
                  insertEventInfo.getInsertEventReqData(),
                  insertEventInfo.getInsertEventPartVals(), dbName,
                  tableName);
        } catch(Exception e) {
          LOG.error("failed to async call fireInsertEventHelper");
        } finally {
  msClient.close();
  LOG.info("fire the insert events asynchronously end.");
   }     
 }, fireInsertEventThread)
              .thenRun(() -> fireInsertEventThread.shutdown());
    } else {
      Stopwatch sw = Stopwatch.createStarted();
      try {
        fireInsertTransactionalEventHelper(msClient.getHiveClient(),
                insertEventInfo, dbName, tableName);
      } catch (Exception e) {
        LOG.error("Failed to fire insert event. Some tables might not be"
                + " refreshed on other impala clusters.", e);
      } finally {
        LOG.info("Time taken to fire insert events on table {}.{}: {} msec", 
dbName,
                tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
        msClient.close();
      }
    }    return Collections.emptyList();
  }{code}
     I am an impala newbie. I hope to get your guidance. Thank you!

 


was (Author: JIRAUSER289149):
Hello !, [~csringhofer] I'm only using non-transactional tables right now and 
it's equally slow. I tried using the Bulk API last week, but the improvement 
was very small. Then I referenced the code in impala3 and modified it to be an 
asynchronous call. The execution speed is greatly improved, but I don't know if 
there is any risk. 
{code:java}
public static List fireInsertEvents(MetaStoreClient msClient,
     TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
    if (!insertEventInfo.isTransactional()) {
      LOG.info("fire the insert events asynchronously.");
      ExecutorService fireInsertEventThread = 
Executors.newSingleThreadExecutor();
      CompletableFuture.runAsync(() -> {
        try {
          fireInsertEventHelper(msClient.getHiveClient(),
                  insertEventInfo.getInsertEventReqData(),
                  insertEventInfo.getInsertEventPartVals(), dbName,
                  tableName);
        } catch(Exception e) {
          LOG.error("failed to async call fireInsertEventHelper");
        } finally {
  msClient.close();
  LOG.info("fire the insert events asynchronously end.");
   }     
 }, fireInsertEventThread)
              .thenRun(() -> fireInsertEventThread.shutdown());
    } else {
      Stopwatch sw = Stopwatch.createStarted();
      try {
        fireInsertTransactionalEventHelper(msClient.getHiveClient(),
                insertEventInfo, dbName, tableName);
      } catch (Exception e) {
        LOG.error("Failed to fire insert event. Some tables might not be"
                + " refreshed on other impala clusters.", e);
      } finally {
        LOG.info("Time taken to fire insert events on table {}.{}: {} msec", 
dbName,
                tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
        msClient.close();
      }
    }    return Collections.emptyList();
  }{code}
     I am not an expert in impala. I hope to get your guidance. Thank you!

 

> FireInsertEvents function can be very slow for tables with large number of 
> partitions.
> --
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Qihong Jiang
>Assignee: Qihong Jiang
>Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
> fireInsertEvents function can be very slow for tables with large number of 
> partitions. So we should 

[jira] [Comment Edited] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Qihong Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620713#comment-17620713
 ] 

Qihong Jiang edited comment on IMPALA-11677 at 10/20/22 5:04 AM:
-

Hello !, [~csringhofer] I'm only using non-transactional tables right now and 
it's equally slow. I tried using the Bulk API last week, but the improvement 
was very small. Then I referenced the code in impala3 and modified it to be an 
asynchronous call. The execution speed is greatly improved, but I don't know if 
there is any risk. 
{code:java}
public static List fireInsertEvents(MetaStoreClient msClient,
     TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
    if (!insertEventInfo.isTransactional()) {
      LOG.info("fire the insert events asynchronously.");
      ExecutorService fireInsertEventThread = 
Executors.newSingleThreadExecutor();
      CompletableFuture.runAsync(() -> {
        try {
          fireInsertEventHelper(msClient.getHiveClient(),
                  insertEventInfo.getInsertEventReqData(),
                  insertEventInfo.getInsertEventPartVals(), dbName,
                  tableName);
        } catch(Exception e) {
          LOG.error("failed to async call fireInsertEventHelper");
        } finally {
  msClient.close();
  LOG.info("fire the insert events asynchronously end.");
   }     
 }, fireInsertEventThread)
              .thenRun(() -> fireInsertEventThread.shutdown());
    } else {
      Stopwatch sw = Stopwatch.createStarted();
      try {
        fireInsertTransactionalEventHelper(msClient.getHiveClient(),
                insertEventInfo, dbName, tableName);
      } catch (Exception e) {
        LOG.error("Failed to fire insert event. Some tables might not be"
                + " refreshed on other impala clusters.", e);
      } finally {
        LOG.info("Time taken to fire insert events on table {}.{}: {} msec", 
dbName,
                tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
        msClient.close();
      }
    }    return Collections.emptyList();
  }{code}
     I am not an expert in impala. I hope to get your guidance. Thank you!

 


was (Author: JIRAUSER289149):
Hello !, [~csringhofer] I'm only using non-transactional tables right now and 
it's equally slow. I tried using the Bulk API last week, but the improvement 
was very small. Then I referenced the code in impala3 and modified it to be an 
asynchronous call. The execution speed is greatly improved, but I don't know if 
there is any risk. 
{code:java}
public static List fireInsertEvents(MetaStoreClient msClient,
     TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
    if (!insertEventInfo.isTransactional()) {
      LOG.info("fire the insert events asynchronously.");
      ExecutorService fireInsertEventThread = 
Executors.newSingleThreadExecutor();
      CompletableFuture.runAsync(() -> {
        try {
          fireInsertEventHelper(msClient.getHiveClient(),
                  insertEventInfo.getInsertEventReqData(),
                  insertEventInfo.getInsertEventPartVals(), dbName,
                  tableName);
        } catch(Exception e) {
          LOG.error("failed to async call fireInsertEventHelper");
        }      }, fireInsertEventThread)
              .thenRun(() -> {
                LOG.info("fire the insert events asynchronously end.");
                msClient.close();
                fireInsertEventThread.shutdown();
              });
    } else {
      Stopwatch sw = Stopwatch.createStarted();
      try {
        fireInsertTransactionalEventHelper(msClient.getHiveClient(),
                insertEventInfo, dbName, tableName);
      } catch (Exception e) {
        LOG.error("Failed to fire insert event. Some tables might not be"
                + " refreshed on other impala clusters.", e);
      } finally {
        LOG.info("Time taken to fire insert events on table {}.{}: {} msec", 
dbName,
                tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
        msClient.close();
      }
    }    return Collections.emptyList();
  }{code}
     I am not an expert in impala. I hope to get your guidance. Thank you!



 

> FireInsertEvents function can be very slow for tables with large number of 
> partitions.
> --
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Qihong Jiang
>Assignee: Qihong Jiang
>Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
> fireInsertEvents function can be very slow for tables with large number of 
> 

[jira] [Commented] (IMPALA-11674) Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0

2022-10-19 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620716#comment-17620716
 ] 

Wenzhe Zhou commented on IMPALA-11674:
--

Non-interactive impala-shell commands cannot reuse connection. How about run 
queries with jdbc?

> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -
>
> Key: IMPALA-11674
> URL: https://issues.apache.org/jira/browse/IMPALA-11674
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Wenzhe Zhou
>Assignee: Riza Suminto
>Priority: Major
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384 
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0. 
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in 
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(), 
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The 
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0 
> and 0.16.0 to throw different exception for timeout. This cause 
> IsPeekTimeoutTException() and IsReadTimeoutTException() to return wrong value 
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to 
> rethrow the exception to caller TAcceptQueueServer::run() and make 
> TAcceptQueueServer::run() to close the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Qihong Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620713#comment-17620713
 ] 

Qihong Jiang commented on IMPALA-11677:
---

Hello !, [~csringhofer] I'm only using non-transactional tables right now and 
it's equally slow. I tried using the Bulk API last week, but the improvement 
was very small. Then I referenced the code in impala3 and modified it to be an 
asynchronous call. The execution speed is greatly improved, but I don't know if 
there is any risk. 
{code:java}
public static List fireInsertEvents(MetaStoreClient msClient,
     TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
    if (!insertEventInfo.isTransactional()) {
      LOG.info("fire the insert events asynchronously.");
      ExecutorService fireInsertEventThread = 
Executors.newSingleThreadExecutor();
      CompletableFuture.runAsync(() -> {
        try {
          fireInsertEventHelper(msClient.getHiveClient(),
                  insertEventInfo.getInsertEventReqData(),
                  insertEventInfo.getInsertEventPartVals(), dbName,
                  tableName);
        } catch(Exception e) {
          LOG.error("failed to async call fireInsertEventHelper");
        }      }, fireInsertEventThread)
              .thenRun(() -> {
                LOG.info("fire the insert events asynchronously end.");
                msClient.close();
                fireInsertEventThread.shutdown();
              });
    } else {
      Stopwatch sw = Stopwatch.createStarted();
      try {
        fireInsertTransactionalEventHelper(msClient.getHiveClient(),
                insertEventInfo, dbName, tableName);
      } catch (Exception e) {
        LOG.error("Failed to fire insert event. Some tables might not be"
                + " refreshed on other impala clusters.", e);
      } finally {
        LOG.info("Time taken to fire insert events on table {}.{}: {} msec", 
dbName,
                tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
        msClient.close();
      }
    }    return Collections.emptyList();
  }{code}
     I am not an expert in impala. I hope to get your guidance. Thank you!



 

> FireInsertEvents function can be very slow for tables with large number of 
> partitions.
> --
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Qihong Jiang
>Assignee: Qihong Jiang
>Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
> fireInsertEvents function can be very slow for tables with large number of 
> partitions. So we should use asynchronous calls.Just like in impala-3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11657) build-all-flag-combinations.sh should tolerate git-reset failures

2022-10-19 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11657.
-
Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> build-all-flag-combinations.sh should tolerate git-reset failures
> -
>
> Key: IMPALA-11657
> URL: https://issues.apache.org/jira/browse/IMPALA-11657
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> When building all options from a tarball using 
> bin/jenkins/build-all-flag-combinations.sh, the build will finally failed by 
> this:
> {noformat}
> fatal: Not a git repository (or any of the parent directories): .git
> ERROR in ./bin/jenkins/build-all-flag-combinations.sh at line 130: git reset 
> --hard HEAD
> Generated: 
> /home/ubuntu/apache-impala-4.1.1/logs/extra_junit_xml_logs/generate_junitxml.buildall.build-all-flag-combinations.20221013_02_26_28.xml
> ./bin/jenkins/build-all-flag-combinations.sh: Cleaning up temporary directory 
> {noformat}
> The corresponding code snipper:
> {code:bash}
>   # Reset the files changed by mvn versions:set
>   git reset --hard HEAD{code}
> [https://github.com/apache/impala/blob/4.1.1-rc2/bin/jenkins/build-all-flag-combinations.sh#L130]
> The failure of git-reset should be ignored. CC [~joemcdonnell] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11657) build-all-flag-combinations.sh should tolerate git-reset failures

2022-10-19 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11657.
-
Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> build-all-flag-combinations.sh should tolerate git-reset failures
> -
>
> Key: IMPALA-11657
> URL: https://issues.apache.org/jira/browse/IMPALA-11657
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> When building all options from a tarball using 
> bin/jenkins/build-all-flag-combinations.sh, the build will finally failed by 
> this:
> {noformat}
> fatal: Not a git repository (or any of the parent directories): .git
> ERROR in ./bin/jenkins/build-all-flag-combinations.sh at line 130: git reset 
> --hard HEAD
> Generated: 
> /home/ubuntu/apache-impala-4.1.1/logs/extra_junit_xml_logs/generate_junitxml.buildall.build-all-flag-combinations.20221013_02_26_28.xml
> ./bin/jenkins/build-all-flag-combinations.sh: Cleaning up temporary directory 
> {noformat}
> The corresponding code snipper:
> {code:bash}
>   # Reset the files changed by mvn versions:set
>   git reset --hard HEAD{code}
> [https://github.com/apache/impala/blob/4.1.1-rc2/bin/jenkins/build-all-flag-combinations.sh#L130]
> The failure of git-reset should be ignored. CC [~joemcdonnell] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11673) Exclude spring-jcl from the classpath

2022-10-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620625#comment-17620625
 ] 

ASF subversion and git services commented on IMPALA-11673:
--

Commit 0935c382e37644e193512494ac7a906b5d4e2e13 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0935c382e ]

IMPALA-11673: Exclude spring-jcl from the Java build

spring-core has a dependency on spring-jcl, which is
a jar that implements the same classes as commons-logging.
Having both on the classpath can lead to conflicts and
errors. Since commons-logging is already on the classpath
via other dependencies, this excludes spring-jcl and adds
it to the list of banned dependencies.

Testing:
 - Ran core job
 - Verified spring-jcl is not present

Change-Id: Ifeb741788662ec1b46303154b40109a4eef67005
Reviewed-on: http://gerrit.cloudera.org:8080/19154
Reviewed-by: Michael Smith 
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 


> Exclude spring-jcl from the classpath
> -
>
> Key: IMPALA-11673
> URL: https://issues.apache.org/jira/browse/IMPALA-11673
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>
> The dependency on spring-core brings in spring-jcl, which is intended as a 
> replacement for the commons-logging framework. spring-jcl implements classes 
> that overlap with commons-logging (which Impala also has on its classpath).
> https://github.com/spring-projects/spring-framework/issues/20611#issuecomment-453462024
> It doesn't seem useful to have spring-jcl around, so let's exclude it to 
> avoid this overlap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11671) run-all-tests.sh with Ozone fails listing DFS files

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11671:
---
Labels: ci-failure  (was: )

> run-all-tests.sh with Ozone fails listing DFS files
> ---
>
> Key: IMPALA-11671
> URL: https://issues.apache.org/jira/browse/IMPALA-11671
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>  Labels: ci-failure
>
> {code}
> hdfs dfs -ls -R ${FILESYSTEM_PREFIX}/test-warehouse \
>   > ${IMPALA_LOGS_DIR}/file-list-end-${i}.log 2>&1
> {code}
> in run-all-tests.sh fails when testing with Ozone. Errors like
> {code}
> ls: Unable to get file status: volume: impala bucket: test-warehouse key: 
> .Trash/jenkins/Current/impala/test-warehouse/test_insert_select_exprs_5162abf6.db/overflowed_decimal_tbl_2/_impala_insert_staging/c94353a630109577_202e9020
> ls: Unable to get file status: volume: impala bucket: test-warehouse key: 
> .Trash/jenkins/Current/impala/test-warehouse/tpch.db/ctas_cancel/_impala_insert_staging/454c6614d6c20d17_94fac58c
> ls: Unable to get file status: volume: impala bucket: test-warehouse key: 
> .Trash/jenkins/Current/impala/test-warehouse/tpch.db/ctas_cancel1665933223846/_impala_insert_staging/7a46352d463f457e_fcabb6e0
> ls: Unable to get file status: volume: impala bucket: test-warehouse key: 
> .Trash/jenkins/Current/impala/test-warehouse/tpch.db/ctas_cancel1665933261855/_impala_insert_staging/d24d1c35c2a9360c_90e32ce2
> {code}
> are visible in file-list-end-1.log. Current theory is that Ozone is cleaning 
> up trash files while we list them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11674) Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0

2022-10-19 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620609#comment-17620609
 ] 

Riza Suminto edited comment on IMPALA-11674 at 10/19/22 9:25 PM:
-

Filed a patch here: [https://gerrit.cloudera.org/c/19157/]

I tried to exercise the failure scenario as custom_cluster test within 
tests/custom_cluster/test_client_ssl.py, but unable to do so because 
impala-shell seems to re-establish the connection if it were run from pytest. 
Running the scenario with interactive impala-shell can reproduce the issue.


was (Author: rizaon):
Filed a patch here: [https://gerrit.cloudera.org/c/19157/]

I tried to exercise the failure scenario as custom_cluster test, but unable to 
do so because impala-shell seems to re-establish the connection if it were run 
from pytest. Running the scenario with interactive impala-shell can reproduce 
the issue.

> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -
>
> Key: IMPALA-11674
> URL: https://issues.apache.org/jira/browse/IMPALA-11674
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Wenzhe Zhou
>Assignee: Riza Suminto
>Priority: Major
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384 
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0. 
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in 
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(), 
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The 
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0 
> and 0.16.0 to throw different exception for timeout. This cause 
> IsPeekTimeoutTException() and IsReadTimeoutTException() to return wrong value 
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to 
> rethrow the exception to caller TAcceptQueueServer::run() and make 
> TAcceptQueueServer::run() to close the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11674) Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0

2022-10-19 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620609#comment-17620609
 ] 

Riza Suminto commented on IMPALA-11674:
---

Filed a patch here: [https://gerrit.cloudera.org/c/19157/]

I tried to exercise the failure scenario as custom_cluster test, but unable to 
do so because impala-shell seems to re-establish the connection if it were run 
from pytest. Running the scenario with interactive impala-shell can reproduce 
the issue.

> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -
>
> Key: IMPALA-11674
> URL: https://issues.apache.org/jira/browse/IMPALA-11674
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Wenzhe Zhou
>Assignee: Riza Suminto
>Priority: Major
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384 
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0. 
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in 
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(), 
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The 
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0 
> and 0.16.0 to throw different exception for timeout. This cause 
> IsPeekTimeoutTException() and IsReadTimeoutTException() to return wrong value 
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to 
> rethrow the exception to caller TAcceptQueueServer::run() and make 
> TAcceptQueueServer::run() to close the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11665) Min/Max filter could crash in fast code path for string data type

2022-10-19 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620584#comment-17620584
 ] 

Qifan Chen edited comment on IMPALA-11665 at 10/19/22 8:39 PM:
---

Setup a table with nulls and empty strings in the STRING column null_str. When 
loading the table, configured the table with 1 page and 3 pages. 

Ran the query in DML section below and observed the following when the fast 
code path was taken.
1. Nulls are not part of the page min/max stats and min/max filter stats at 
all, which is good;
2. The runtime filtering works as designed. 

DDL


{code:java}
create table null_pq (
id string, 
null_str string,
null_int int
) 
sort by (null_str) 
stored as parquet
;
{code}


data loading:


{code:java}
set PARQUET_PAGE_ROW_COUNT_LIMIT=12;
insert into null_pq values
('a', null, 1),
('b', null, 2),
('c',null,3),
('aa', 'a', 1),
('ab', 'b', 2),
('ac','c',3),
('ad', '', 4),
('ae', '', 5),
('ac','',6);


{code}

1 page case (set PARQUET_PAGE_ROW_COUNT_LIMIT=12)



{code:java}
[14:11:06 qchen@qifan-10229: src] pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/9341bc3df646c530-9701c2fc_162963959_data.0.parq
22/10/17 14:23:15 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
row group 0 

id:BINARY SNAPPY DO:4 FPO:56 SZ:85/89/1.05 VC:9 ENC:RLE,PLAIN_DICTIONARY
null_str:  BINARY SNAPPY DO:146 FPO:180 SZ:64/60/0.94 VC:9 ENC:RLE,PLA [more]...
null_int:  INT32 SNAPPY DO:273 FPO:312 SZ:72/68/0.94 VC:9 ENC:RLE,PLAI [more]...

id TV=9 RL=0 DL=1 DS:   8 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

null_str TV=9 RL=0 DL=1 DS: 4 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

null_int TV=9 RL=0 DL=1 DS: 6 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

BINARY id 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:ad
value 2: R:0 D:1 V:ae
value 3: R:0 D:1 V:ac
value 4: R:0 D:1 V:aa
value 5: R:0 D:1 V:ab
value 6: R:0 D:1 V:ac
value 7: R:0 D:1 V:a
value 8: R:0 D:1 V:b
value 9: R:0 D:1 V:c

BINARY null_str 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:
value 2: R:0 D:1 V:
value 3: R:0 D:1 V:
value 4: R:0 D:1 V:a
value 5: R:0 D:1 V:b
value 6: R:0 D:1 V:c
value 7: R:0 D:0 V:
value 8: R:0 D:0 V:
value 9: R:0 D:0 V:

INT32 null_int 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:4
value 2: R:0 D:1 V:5
value 3: R:0 D:1 V:6
value 4: R:0 D:1 V:1
value 5: R:0 D:1 V:2
value 6: R:0 D:1 V:3
value 7: R:0 D:1 V:1
value 8: R:0 D:1 V:2
value 9: R:0 D:1 V:3
[14:23:16 qchen@qifan-10229: src] 
{code}




3 page case (set PARQUET_PAGE_ROW_COUNT_LIMIT=4)


{code:java}
pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/aa449f944bb9d005-7df200e3_811956887_data.0.parq

[13:50:22 qchen@qifan-10229: cluster] pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/aa449f944bb9d005-7df200e3_811956887_data.0.parq
22/10/17 13:51:02 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
row group 0 

id:BINARY SNAPPY DO:4 FPO:56 SZ:139/139/1.00 VC:9 ENC:RLE,PLAI [more]...
null_str:  BINARY SNAPPY DO:200 FPO:234 SZ:116/108/0.93 VC:9 ENC:RLE,P [more]...
null_int:  INT32 SNAPPY DO:388 FPO:427 SZ:126/118/0.94 VC:9 ENC:RLE,PL [more]...

id TV=9 RL=0 DL=1 DS:   8 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 1:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 2:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:1

null_str TV=9 RL=0 DL=1 DS: 4 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 1:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 2:  DLE:RLE RLE:RLE VLE:PLAIN ST:[no stat 
[more]... VC:1

null_int TV=9 RL=0 DL=1 DS: 6 DE:PLAIN_DICTIONARY

[jira] [Comment Edited] (IMPALA-11665) Min/Max filter could crash in fast code path for string data type

2022-10-19 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620584#comment-17620584
 ] 

Qifan Chen edited comment on IMPALA-11665 at 10/19/22 8:39 PM:
---

Setup a table with nulls and empty strings in the STRING column null_str. When 
loading the table, configured the table with 1 page and 3 pages. 

Ran the query in DML section below and observed the following when the fast 
code path was taken.
1. Nulls are not part of the page min/max stats and min/max filter stats at 
all, which is good;
2. The runtime filtering works as designed. 

DDL


{code:java}
create table null_pq (
id string, 
null_str string,
null_int int
) 
sort by (null_str) 
stored as parquet
;
{code}


data loading:


{code:java}
set PARQUET_PAGE_ROW_COUNT_LIMIT=12;
insert into null_pq values
('a', null, 1),
('b', null, 2),
('c',null,3),
('aa', 'a', 1),
('ab', 'b', 2),
('ac','c',3),
('ad', '', 4),
('ae', '', 5),
('ac','',6);


{code}

1 page case (set PARQUET_PAGE_ROW_COUNT_LIMIT=12)



{code:java}
[14:11:06 qchen@qifan-10229: src] pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/9341bc3df646c530-9701c2fc_162963959_data.0.parq
22/10/17 14:23:15 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
row group 0 

id:BINARY SNAPPY DO:4 FPO:56 SZ:85/89/1.05 VC:9 ENC:RLE,PLAIN_DICTIONARY
null_str:  BINARY SNAPPY DO:146 FPO:180 SZ:64/60/0.94 VC:9 ENC:RLE,PLA [more]...
null_int:  INT32 SNAPPY DO:273 FPO:312 SZ:72/68/0.94 VC:9 ENC:RLE,PLAI [more]...

id TV=9 RL=0 DL=1 DS:   8 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

null_str TV=9 RL=0 DL=1 DS: 4 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

null_int TV=9 RL=0 DL=1 DS: 6 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

BINARY id 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:ad
value 2: R:0 D:1 V:ae
value 3: R:0 D:1 V:ac
value 4: R:0 D:1 V:aa
value 5: R:0 D:1 V:ab
value 6: R:0 D:1 V:ac
value 7: R:0 D:1 V:a
value 8: R:0 D:1 V:b
value 9: R:0 D:1 V:c

BINARY null_str 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:
value 2: R:0 D:1 V:
value 3: R:0 D:1 V:
value 4: R:0 D:1 V:a
value 5: R:0 D:1 V:b
value 6: R:0 D:1 V:c
value 7: R:0 D:0 V:
value 8: R:0 D:0 V:
value 9: R:0 D:0 V:

INT32 null_int 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:4
value 2: R:0 D:1 V:5
value 3: R:0 D:1 V:6
value 4: R:0 D:1 V:1
value 5: R:0 D:1 V:2
value 6: R:0 D:1 V:3
value 7: R:0 D:1 V:1
value 8: R:0 D:1 V:2
value 9: R:0 D:1 V:3
[14:23:16 qchen@qifan-10229: src] 
{code}




3 pages case (set PARQUET_PAGE_ROW_COUNT_LIMIT=4)


{code:java}
pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/aa449f944bb9d005-7df200e3_811956887_data.0.parq

[13:50:22 qchen@qifan-10229: cluster] pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/aa449f944bb9d005-7df200e3_811956887_data.0.parq
22/10/17 13:51:02 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
row group 0 

id:BINARY SNAPPY DO:4 FPO:56 SZ:139/139/1.00 VC:9 ENC:RLE,PLAI [more]...
null_str:  BINARY SNAPPY DO:200 FPO:234 SZ:116/108/0.93 VC:9 ENC:RLE,P [more]...
null_int:  INT32 SNAPPY DO:388 FPO:427 SZ:126/118/0.94 VC:9 ENC:RLE,PL [more]...

id TV=9 RL=0 DL=1 DS:   8 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 1:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 2:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:1

null_str TV=9 RL=0 DL=1 DS: 4 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 1:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 2:  DLE:RLE RLE:RLE VLE:PLAIN ST:[no stat 
[more]... VC:1

null_int TV=9 RL=0 DL=1 DS: 6 DE:PLAIN_DICTIONARY

[jira] [Commented] (IMPALA-11665) Min/Max filter could crash in fast code path for string data type

2022-10-19 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620587#comment-17620587
 ] 

Qifan Chen commented on IMPALA-11665:
-

It may be helpful to obtain the parquet data file(s) involved in the crash and 
to try the offending query afterwards. 

> Min/Max filter could crash in fast code path for string data type
> -
>
> Key: IMPALA-11665
> URL: https://issues.apache.org/jira/browse/IMPALA-11665
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Qifan Chen
>Priority: Critical
>
> The impalad logs show that memcmp failed due to a segfault:
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f0396c3ff22, pid=1, tid=0x7f023f365700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_332-b09) (build 1.8.0_332-b09)
> # Java VM: OpenJDK 64-Bit Server VM (25.332-b09 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x16af22]  __memcmp_sse4_1+0xd42 {code}
> Resolved Stack Trace for the crashed thread:
> {code:java}
> Thread 530 (crashed)
>  0  libc-2.17.so + 0x16af22
>     rax = 0x7f61567715f0   rdx = 0x000a
>     rcx = 0x7f62ae04cf22   rbx = 0x
>     rsi = 0x5d1e900a   rdi = 0x000a
>     rbp = 0x7f6156771560   rsp = 0x7f6156771548
>      r8 = 0x034d40f0    r9 = 0x7f62ae022e90
>     r10 = 0x0498ff6c   r11 = 0x7f62ae06f590
>     r12 = 0x000a   r13 = 0x1a9678e8
>     r14 = 0x7f6156771730   r15 = 0x01b1f380
>     rip = 0x7f62ae04cf22
>     Found by: given as instruction pointer in context
>  1  
> impalad!impala::HdfsParquetScanner::CollectSkippedPageRangesForSortedColumn(impala::MinMaxFilter
>  const*, impala::ColumnType const&, 
> std::vector, 
> std::allocator >, std::allocator std::char_traits, std::allocator > > > const&, 
> std::vector, 
> std::allocator >, std::allocator std::char_traits, std::allocator > > > const&, int, int, 
> std::vector >*) 
> [hdfs-parquet-scanner.cc : 1388 + 0x3]
>     rbp = 0x7f6156771650   rsp = 0x7f6156771570
>     rip = 0x01b10305
>     Found by: previous frame's frame pointer
>  2  impalad!impala::HdfsParquetScanner::SkipPagesBatch(parquet::RowGroup&, 
> impala::ColumnStatsReader const&, parquet::ColumnIndex const&, int, int, 
> impala::ColumnType const&, int, parquet::ColumnChunk const&, 
> impala::MinMaxFilter const*, std::vector std::allocator >*, int*) [hdfs-parquet-scanner.cc : 1230 + 
> 0x34]
>     rbx = 0x7f61567716f0   rbp = 0x7f61567717e0
>     rsp = 0x7f6156771660   r12 = 0x7f6156771710
>     r13 = 0x7f6156771950   r14 = 0x1a9678e8
>     r15 = 0x7f6156771920   rip = 0x01b14838
>     Found by: call frame info
>  3  
> impalad!impala::HdfsParquetScanner::FindSkipRangesForPagesWithMinMaxFilters(std::vector  std::allocator >*) [hdfs-parquet-scanner.cc : 1528 + 0x57]
>     rbx = 0x004a   rbp = 0x7f6156771b10
>     rsp = 0x7f61567717f0   r12 = 0x2c195800
>     r13 = 0x2aa115d0   r14 = 0x0001
>     r15 = 0x0049   rip = 0x01b1cf1a
>     Found by: call frame info
>  4  impalad!impala::HdfsParquetScanner::EvaluatePageIndex() 
> [hdfs-parquet-scanner.cc : 1600 + 0x19]
>     rbx = 0x7f6156771c30   rbp = 0x7f6156771cf0
>     rsp = 0x7f6156771b20   r12 = 0x2c195800
>     r13 = 0x7f6156771de8   r14 = 0x104528a0
>     r15 = 0x7f6156771df0   rip = 0x01b1d9dd
>     Found by: call frame info
>  5  impalad!impala::HdfsParquetScanner::ProcessPageIndex() 
> [hdfs-parquet-scanner.cc : 1318 + 0xb]
>     rbx = 0x2c195800   rbp = 0x7f6156771d70
>     rsp = 0x7f6156771d00   r12 = 0x7f6156771d10
>     r13 = 0x7f6156771de8   r14 = 0x104528a0
>     r15 = 0x7f6156771df0   rip = 0x01b1dd0b
>     Found by: call frame info
>  6  impalad!impala::HdfsParquetScanner::NextRowGroup() 
> [hdfs-parquet-scanner.cc : 934 + 0xf]
>     rbx = 0x318ce040   rbp = 0x7f6156771e40
>     rsp = 0x7f6156771d80   r12 = 0x2c195800
>     r13 = 0x7f6156771de8   r14 = 0x104528a0
>     r15 = 0x7f6156771df0   rip = 0x01b1e1b4
>     Found by: call frame info
>  7  impalad!impala::HdfsParquetScanner::GetNextInternal(impala::RowBatch*) 
> [hdfs-parquet-scanner.cc : 504 + 0xb]
>     rbx = 0x2c195800   rbp = 0x7f6156771ec0
>     rsp = 0x7f6156771e50   r12 = 0xc1ca4d00
>     r13 = 0x7f6156771e78   r14 = 0x7f6156771e80
>     r15 = 0xaaab   rip = 0x01b1ed5b
>     Found by: call frame info
>  8  

[jira] [Commented] (IMPALA-11665) Min/Max filter could crash in fast code path for string data type

2022-10-19 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620584#comment-17620584
 ] 

Qifan Chen commented on IMPALA-11665:
-

Setup a table with nulls and empty strings in the STRING columns. When loading, 
configured the table with 1 page and 3 pages. 

Ran the query in DML section below and observed the following when the fast 
code path is taken.
1. Nulls are not part of the page min/max stats and min/max filter stats at 
all, which is good;
2. The runtime filtering works as designed. 

DDL


{code:java}
create table null_pq (
id string, 
null_str string,
null_int int
) 
sort by (null_str) 
stored as parquet
;
{code}


data loading:


{code:java}
set PARQUET_PAGE_ROW_COUNT_LIMIT=12;
insert into null_pq values
('a', null, 1),
('b', null, 2),
('c',null,3),
('aa', 'a', 1),
('ab', 'b', 2),
('ac','c',3),
('ad', '', 4),
('ae', '', 5),
('ac','',6);


{code}

1 page case (set PARQUET_PAGE_ROW_COUNT_LIMIT=12)



{code:java}
[14:11:06 qchen@qifan-10229: src] pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/9341bc3df646c530-9701c2fc_162963959_data.0.parq
22/10/17 14:23:15 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
row group 0 

id:BINARY SNAPPY DO:4 FPO:56 SZ:85/89/1.05 VC:9 ENC:RLE,PLAIN_DICTIONARY
null_str:  BINARY SNAPPY DO:146 FPO:180 SZ:64/60/0.94 VC:9 ENC:RLE,PLA [more]...
null_int:  INT32 SNAPPY DO:273 FPO:312 SZ:72/68/0.94 VC:9 ENC:RLE,PLAI [more]...

id TV=9 RL=0 DL=1 DS:   8 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

null_str TV=9 RL=0 DL=1 DS: 4 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

null_int TV=9 RL=0 DL=1 DS: 6 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:9

BINARY id 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:ad
value 2: R:0 D:1 V:ae
value 3: R:0 D:1 V:ac
value 4: R:0 D:1 V:aa
value 5: R:0 D:1 V:ab
value 6: R:0 D:1 V:ac
value 7: R:0 D:1 V:a
value 8: R:0 D:1 V:b
value 9: R:0 D:1 V:c

BINARY null_str 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:
value 2: R:0 D:1 V:
value 3: R:0 D:1 V:
value 4: R:0 D:1 V:a
value 5: R:0 D:1 V:b
value 6: R:0 D:1 V:c
value 7: R:0 D:0 V:
value 8: R:0 D:0 V:
value 9: R:0 D:0 V:

INT32 null_int 

*** row group 1 of 1, values 1 to 9 *** 
value 1: R:0 D:1 V:4
value 2: R:0 D:1 V:5
value 3: R:0 D:1 V:6
value 4: R:0 D:1 V:1
value 5: R:0 D:1 V:2
value 6: R:0 D:1 V:3
value 7: R:0 D:1 V:1
value 8: R:0 D:1 V:2
value 9: R:0 D:1 V:3
[14:23:16 qchen@qifan-10229: src] 
{code}




3 pages case (set PARQUET_PAGE_ROW_COUNT_LIMIT=4)


{code:java}
pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/aa449f944bb9d005-7df200e3_811956887_data.0.parq

[13:50:22 qchen@qifan-10229: cluster] pqtools dump 
hdfs://localhost:20500/test-warehouse/null_pq/aa449f944bb9d005-7df200e3_811956887_data.0.parq
22/10/17 13:51:02 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
row group 0 

id:BINARY SNAPPY DO:4 FPO:56 SZ:139/139/1.00 VC:9 ENC:RLE,PLAI [more]...
null_str:  BINARY SNAPPY DO:200 FPO:234 SZ:116/108/0.93 VC:9 ENC:RLE,P [more]...
null_int:  INT32 SNAPPY DO:388 FPO:427 SZ:126/118/0.94 VC:9 ENC:RLE,PL [more]...

id TV=9 RL=0 DL=1 DS:   8 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 1:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 2:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:1

null_str TV=9 RL=0 DL=1 DS: 4 DE:PLAIN_DICTIONARY

page 0:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 1:  DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY  
[more]... VC:4
page 2:  DLE:RLE RLE:RLE VLE:PLAIN ST:[no stat 
[more]... VC:1

null_int TV=9 RL=0 DL=1 DS: 6 DE:PLAIN_DICTIONARY

page 

[jira] [Resolved] (IMPALA-11581) ALTER TABLE RENAME TO doesn't update transient_lastDdlTime

2022-10-19 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11581.

Resolution: Fixed

> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime
> --
>
> Key: IMPALA-11581
> URL: https://issues.apache.org/jira/browse/IMPALA-11581
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: ramp-up
>
> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime.
> The following statements behave differently when executed via Hive or Impala:
> {noformat}
> CREATE TABLE rename_from (i int);
> ALTER TABLE rename_from RENAME TO rename_to;
> {noformat}
> During ALTER TABLE ... RENAME TO ... Hive updates transient_lastDdlTime while 
> Impala leaves it unchanged.
> Impala should follow Hive's behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11581) ALTER TABLE RENAME TO doesn't update transient_lastDdlTime

2022-10-19 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-11581.

Resolution: Fixed

> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime
> --
>
> Key: IMPALA-11581
> URL: https://issues.apache.org/jira/browse/IMPALA-11581
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: ramp-up
>
> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime.
> The following statements behave differently when executed via Hive or Impala:
> {noformat}
> CREATE TABLE rename_from (i int);
> ALTER TABLE rename_from RENAME TO rename_to;
> {noformat}
> During ALTER TABLE ... RENAME TO ... Hive updates transient_lastDdlTime while 
> Impala leaves it unchanged.
> Impala should follow Hive's behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-9487) SHOW and DESCRIBE statements should display EC policies

2022-10-19 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620566#comment-17620566
 ] 

Michael Smith commented on IMPALA-9487:
---

It might also be useful to surface erasure coded partitions in query profiles.

> SHOW and DESCRIBE statements should display EC policies
> ---
>
> Key: IMPALA-9487
> URL: https://issues.apache.org/jira/browse/IMPALA-9487
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: observability
>
> Since EC policies can be set per-file, the {{show files}} command should 
> display if a file is an EC file, and what the EC policy is.
> EC policies can be set on a table level directory, so it would be useful if 
> 'describe extended [table-name]' indicated if the table had an EC policy set 
> or not.
> For partitioned tables, {{show partitions}} should list out the EC policy of 
> each partition directory (we already do something similar for HDFS cacheing).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11666) Consider revising the warning message when hasCorruptTableStats_ is true for a table

2022-10-19 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-11666:
-
Description: 
Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true when 
one of the following is true in 
[HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
 # Its '{{{}cardinality_{}}}' less than -1.
 # The number of rows in one of its partition is less than -1.
 # The number of rows in one of its partition is 0 but the size of the 
associated files of this partition is greater than 0.
 # The number of rows in the table is 0 but the size of the associated files of 
this table is greater than 0.

For such a table, the {{EXPLAIN}} statement for queries involving the table 
would contain the message of "{{{}WARNING: The following tables have 
potentially corrupt table statistics. Drop and re-compute statistics to resolve 
this problem.{}}}"

The warning message may be a bit too scary for an Impala user especially if we 
consider the fact that a table without corrupt statistics could indeed have its 
'{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.

Specifically, a table without corrupt statistics but having its 
'{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after 
starting the Impala cluster.
 # Execute on the command line "{{{}beeline -u 
"jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
 # Create a transactional table in beeline via "{{{}create table 
test_db.test_tbl_01 (id int, name string) stored as orc tblproperties 
('transactional'='true'){}}}".
 # Insert a row into the table just created in beeline via "{{{}insert into 
table test_db.test_tbl_01 (1, "Alex");{}}}".
 # Delete the row just inserted in beeline via "{{{}delete from 
test_db.test_tbl_01 where id = 1{}}}".
# In Impala shell, execute "{{compute stats test_db.test_tbl_01}}".
 # In Impala shell, execute "{{{}explain select * from test_db.test_tbl_01{}}}" 
to verify that the warning message described above appears in the output.

The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size is 
greater than 0.

It may be better that we revise the warning message to something less scary as 
shown below.
{code:java}
The number of rows in the following tables or in a partition of them has 0 or 
fewer than -1 row but positive total file size.
This does not necessarily imply the existence of corrupt statistics.
In the case of corrupt statistics, drop and re-compute statistics could resolve 
this problem.
{code}

  was:
Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true when 
one of the following is true in 
[HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
 # Its '{{{}cardinality_{}}}' is -1.
 # The number of rows in one of its partition is less than -1.
 # The number of rows in one of its partition is 0 but the size of the 
associated files of this partition is greater than 0.
 # The number of rows in the table is 0 but the size of the associated files of 
this table is greater than 0.

For such a table, the {{EXPLAIN}} statement for queries involving the table 
would contain the message of "{{{}WARNING: The following tables have 
potentially corrupt table statistics. Drop and re-compute statistics to resolve 
this problem.{}}}"

The warning message may be a bit too scary for an Impala user especially if we 
consider the fact that a table without corrupt statistics could indeed have its 
'{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.

Specifically, a table without corrupt statistics but having its 
'{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after 
starting the Impala cluster.
 # Execute on the command line "{{{}beeline -u 
"jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
 # Create a transactional table in beeline via "{{{}create table 
test_db.test_tbl_01 (id int, name string) stored as orc tblproperties 
('transactional'='true'){}}}".
 # Insert a row into the table just created in beeline via "{{{}insert into 
table test_db.test_tbl_01 (1, "Alex");{}}}".
 # Delete the row just inserted in beeline via "{{{}delete from 
test_db.test_tbl_01 where id = 1{}}}".
# In Impala shell, execute "{{compute stats test_db.test_tbl_01}}".
 # In Impala shell, execute "{{{}explain select * from test_db.test_tbl_01{}}}" 
to verify that the warning message described above appears in the output.

The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size is 
greater than 0.

It may be better that we revise the warning message to something less scary as 
shown below.
{code:java}
The number of rows in the following tables or in a partition of them has 0 or 
fewer than -1 row but positive total file size.
This does not necessarily 

[jira] [Created] (IMPALA-11678) Upgrade pac4j to 4.5.7

2022-10-19 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11678:
--

 Summary: Upgrade pac4j to 4.5.7
 Key: IMPALA-11678
 URL: https://issues.apache.org/jira/browse/IMPALA-11678
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.2.0
Reporter: Joe McDonnell


pac4j has released a few more patchset releases for the 4.5 series, and they 
bump the versions of dependencies to address CVEs (e.g. springframework). Our 
build already overrides the versions for dependencies with CVEs (e.g. 
springframework), but we should adopt the newer 4.5.7 version to bring the 
versions back into alignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11678) Upgrade pac4j to 4.5.7

2022-10-19 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11678:
--

 Summary: Upgrade pac4j to 4.5.7
 Key: IMPALA-11678
 URL: https://issues.apache.org/jira/browse/IMPALA-11678
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.2.0
Reporter: Joe McDonnell


pac4j has released a few more patchset releases for the 4.5 series, and they 
bump the versions of dependencies to address CVEs (e.g. springframework). Our 
build already overrides the versions for dependencies with CVEs (e.g. 
springframework), but we should adopt the newer 4.5.7 version to bring the 
versions back into alignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11667) Clean up POMs using dependencyManagement

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-11667.

Resolution: Fixed

> Clean up POMs using dependencyManagement
> 
>
> Key: IMPALA-11667
> URL: https://issues.apache.org/jira/browse/IMPALA-11667
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> Conflicts in Impala's Java transitive dependencies have historically been 
> handled by excluding them from immediate dependencies and directly including 
> the conflicted package. This requires monitoring for that package in any new 
> dependencies and carrying exclusions across a number of packages.
> Maven provides a depedencyManagement section to directly control versions of 
> transitive dependencies. Use this to clean up many of Impala's exclusions so 
> they're easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11670) Upgrade components for CVEs, make it easier to override versions

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-11670.

Resolution: Fixed

> Upgrade components for CVEs, make it easier to override versions
> 
>
> Key: IMPALA-11670
> URL: https://issues.apache.org/jira/browse/IMPALA-11670
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> Upgrade guava and jackson-databind for
> - guava: 
> [CVE-2020-8908|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908]
> - jackson-databind: 
> [CVE-2022-42003|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003],
>  
> [CVE-2022-42004|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004]
> Also make it easier to override versions of commonly updated dependencies by 
> declaring the version as environment variables in impala-config.sh (so that 
> impala-branch-config.sh can override it as needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11670) Upgrade components for CVEs, make it easier to override versions

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-11670.

Resolution: Fixed

> Upgrade components for CVEs, make it easier to override versions
> 
>
> Key: IMPALA-11670
> URL: https://issues.apache.org/jira/browse/IMPALA-11670
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> Upgrade guava and jackson-databind for
> - guava: 
> [CVE-2020-8908|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908]
> - jackson-databind: 
> [CVE-2022-42003|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003],
>  
> [CVE-2022-42004|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004]
> Also make it easier to override versions of commonly updated dependencies by 
> declaring the version as environment variables in impala-config.sh (so that 
> impala-branch-config.sh can override it as needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11667) Clean up POMs using dependencyManagement

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-11667.

Resolution: Fixed

> Clean up POMs using dependencyManagement
> 
>
> Key: IMPALA-11667
> URL: https://issues.apache.org/jira/browse/IMPALA-11667
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> Conflicts in Impala's Java transitive dependencies have historically been 
> handled by excluding them from immediate dependencies and directly including 
> the conflicted package. This requires monitoring for that package in any new 
> dependencies and carrying exclusions across a number of packages.
> Maven provides a depedencyManagement section to directly control versions of 
> transitive dependencies. Use this to clean up many of Impala's exclusions so 
> they're easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-11670) Upgrade components for CVEs, make it easier to override versions

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11670:
---
Fix Version/s: Impala 4.2.0

> Upgrade components for CVEs, make it easier to override versions
> 
>
> Key: IMPALA-11670
> URL: https://issues.apache.org/jira/browse/IMPALA-11670
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> Upgrade guava and jackson-databind for
> - guava: 
> [CVE-2020-8908|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908]
> - jackson-databind: 
> [CVE-2022-42003|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003],
>  
> [CVE-2022-42004|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004]
> Also make it easier to override versions of commonly updated dependencies by 
> declaring the version as environment variables in impala-config.sh (so that 
> impala-branch-config.sh can override it as needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11670) Upgrade components for CVEs, make it easier to override versions

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11670:
---
Priority: Critical  (was: Major)

> Upgrade components for CVEs, make it easier to override versions
> 
>
> Key: IMPALA-11670
> URL: https://issues.apache.org/jira/browse/IMPALA-11670
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> Upgrade guava and jackson-databind for
> - guava: 
> [CVE-2020-8908|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908]
> - jackson-databind: 
> [CVE-2022-42003|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003],
>  
> [CVE-2022-42004|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004]
> Also make it easier to override versions of commonly updated dependencies by 
> declaring the version as environment variables in impala-config.sh (so that 
> impala-branch-config.sh can override it as needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11667) Clean up POMs using dependencyManagement

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11667:
---
Priority: Critical  (was: Major)

> Clean up POMs using dependencyManagement
> 
>
> Key: IMPALA-11667
> URL: https://issues.apache.org/jira/browse/IMPALA-11667
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> Conflicts in Impala's Java transitive dependencies have historically been 
> handled by excluding them from immediate dependencies and directly including 
> the conflicted package. This requires monitoring for that package in any new 
> dependencies and carrying exclusions across a number of packages.
> Maven provides a depedencyManagement section to directly control versions of 
> transitive dependencies. Use this to clean up many of Impala's exclusions so 
> they're easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11667) Clean up POMs using dependencyManagement

2022-10-19 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11667:
---
Fix Version/s: Impala 4.2.0

> Clean up POMs using dependencyManagement
> 
>
> Key: IMPALA-11667
> URL: https://issues.apache.org/jira/browse/IMPALA-11667
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> Conflicts in Impala's Java transitive dependencies have historically been 
> handled by excluding them from immediate dependencies and directly including 
> the conflicted package. This requires monitoring for that package in any new 
> dependencies and carrying exclusions across a number of packages.
> Maven provides a depedencyManagement section to directly control versions of 
> transitive dependencies. Use this to clean up many of Impala's exclusions so 
> they're easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11670) Upgrade components for CVEs, make it easier to override versions

2022-10-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620445#comment-17620445
 ] 

ASF subversion and git services commented on IMPALA-11670:
--

Commit 83c5e6e4098d8ed75de09a7e228d6ef10de2ee12 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=83c5e6e40 ]

IMPALA-11670: Upgrade components, add envvars for override

Upgrades guava to 31.1-jre and jackson-databind to 2.13.4.2 to address
CVEs. Adds environment variables for commonly-updated components so they
can be customized via the branch-specific impala-config-branch.sh in a
way that allows both to be updated regularly without merge conflicts.

Also updates httpcomponents.httpcore to 4.4.14 to be consistent with
other httpcomponents libraries included transitively.

Change-Id: I1c2c4481ca3f498abf302aa05361d950b1ed1216
Reviewed-on: http://gerrit.cloudera.org:8080/19147
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Upgrade components for CVEs, make it easier to override versions
> 
>
> Key: IMPALA-11670
> URL: https://issues.apache.org/jira/browse/IMPALA-11670
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Upgrade guava and jackson-databind for
> - guava: 
> [CVE-2020-8908|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-8908]
> - jackson-databind: 
> [CVE-2022-42003|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42003],
>  
> [CVE-2022-42004|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42004]
> Also make it easier to override versions of commonly updated dependencies by 
> declaring the version as environment variables in impala-config.sh (so that 
> impala-branch-config.sh can override it as needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11667) Clean up POMs using dependencyManagement

2022-10-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620444#comment-17620444
 ] 

ASF subversion and git services commented on IMPALA-11667:
--

Commit 22e5ca3d0a891373251a4f08f3c3824491336e34 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=22e5ca3d0 ]

IMPALA-11667: Clean up Java dependency exclusions

Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.

Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.

Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell 
Tested-by: Joe McDonnell 


> Clean up POMs using dependencyManagement
> 
>
> Key: IMPALA-11667
> URL: https://issues.apache.org/jira/browse/IMPALA-11667
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Conflicts in Impala's Java transitive dependencies have historically been 
> handled by excluding them from immediate dependencies and directly including 
> the conflicted package. This requires monitoring for that package in any new 
> dependencies and carrying exclusions across a number of packages.
> Maven provides a depedencyManagement section to directly control versions of 
> transitive dependencies. Use this to clean up many of Impala's exclusions so 
> they're easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11581) ALTER TABLE RENAME TO doesn't update transient_lastDdlTime

2022-10-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620443#comment-17620443
 ] 

ASF subversion and git services commented on IMPALA-11581:
--

Commit ad438e7e3cc89a7e8511fcfe67c13411de987007 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ad438e7e3 ]

IMPALA-11581: ALTER TABLE RENAME TO doesn't update transient_lastDdlTime

The following statements behave differently when executed via Hive or
Impala:

CREATE TABLE rename_from (i int);
ALTER TABLE rename_from RENAME TO rename_to;

Hive updates transient_lastDdlTime while Impala leaves it unchanged.

This patch fixes the behaviour of Impala so that it also updates
transient_lastDdlTime.

Testing:
 - Added a test in test_last_ddl_time_update.py that checks that
   transient_lastDdlTime is updated on rename. Refactored the class a
   bit so that the new test fits in easier.

Change-Id: Ib550feaebbad9cf6c9b34ab046293968b157a50c
Reviewed-on: http://gerrit.cloudera.org:8080/19137
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime
> --
>
> Key: IMPALA-11581
> URL: https://issues.apache.org/jira/browse/IMPALA-11581
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Daniel Becker
>Priority: Major
>  Labels: ramp-up
>
> ALTER TABLE RENAME TO doesn't update transient_lastDdlTime.
> The following statements behave differently when executed via Hive or Impala:
> {noformat}
> CREATE TABLE rename_from (i int);
> ALTER TABLE rename_from RENAME TO rename_to;
> {noformat}
> During ALTER TABLE ... RENAME TO ... Hive updates transient_lastDdlTime while 
> Impala leaves it unchanged.
> Impala should follow Hive's behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620441#comment-17620441
 ] 

Csaba Ringhofer commented on IMPALA-11677:
--

[~hong7j] I have created IMPALA-11108 about switching to a batched API when 
creating write events. It should make FireInsertEvents much faster for Hive 
ACID tables.

Do you think that it would help with this issue? Or non-ACID tables are also 
not fast enough?

> FireInsertEvents function can be very slow for tables with large number of 
> partitions.
> --
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Qihong Jiang
>Assignee: Qihong Jiang
>Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
> fireInsertEvents function can be very slow for tables with large number of 
> partitions. So we should use asynchronous calls.Just like in impala-3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620301#comment-17620301
 ] 

Quanlong Huang commented on IMPALA-11677:
-

Thanks for filing this issue! Could you share some perf numbers that you saw 
are slow?
Also let's discuss how you plan to fix this before writing codes. That might 
help to save back-and-forth rounds in the code review.

BTW, we use Gerrit for code review: 
https://cwiki.apache.org/confluence/display/IMPALA/Using+Gerrit+to+submit+and+review+patches

> FireInsertEvents function can be very slow for tables with large number of 
> partitions.
> --
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Qihong Jiang
>Assignee: Qihong Jiang
>Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
> fireInsertEvents function can be very slow for tables with large number of 
> partitions. So we should use asynchronous calls.Just like in impala-3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-11677:
---

Assignee: Qihong Jiang

> FireInsertEvents function can be very slow for tables with large number of 
> partitions.
> --
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Qihong Jiang
>Assignee: Qihong Jiang
>Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
> fireInsertEvents function can be very slow for tables with large number of 
> partitions. So we should use asynchronous calls.Just like in impala-3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Qihong Jiang (Jira)
Qihong Jiang created IMPALA-11677:
-

 Summary: FireInsertEvents function can be very slow for tables 
with large number of partitions.
 Key: IMPALA-11677
 URL: https://issues.apache.org/jira/browse/IMPALA-11677
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Affects Versions: Impala 4.1.0
Reporter: Qihong Jiang


In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
fireInsertEvents function can be very slow for tables with large number of 
partitions. So we should use asynchronous calls.Just like in impala-3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.

2022-10-19 Thread Qihong Jiang (Jira)
Qihong Jiang created IMPALA-11677:
-

 Summary: FireInsertEvents function can be very slow for tables 
with large number of partitions.
 Key: IMPALA-11677
 URL: https://issues.apache.org/jira/browse/IMPALA-11677
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Affects Versions: Impala 4.1.0
Reporter: Qihong Jiang


In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. 
fireInsertEvents function can be very slow for tables with large number of 
partitions. So we should use asynchronous calls.Just like in impala-3.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11676) Prettify apache docs

2022-10-19 Thread Tamas Mate (Jira)
Tamas Mate created IMPALA-11676:
---

 Summary: Prettify apache docs
 Key: IMPALA-11676
 URL: https://issues.apache.org/jira/browse/IMPALA-11676
 Project: IMPALA
  Issue Type: Improvement
  Components: Docs
Affects Versions: Impala 4.1.1
Reporter: Tamas Mate
Assignee: Tamas Mate
 Attachments: doc_style_v1.png

The current apache doc site is not that user friendly, we could make it a bit 
more accessible with a side navigation bar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11676) Prettify apache docs

2022-10-19 Thread Tamas Mate (Jira)
Tamas Mate created IMPALA-11676:
---

 Summary: Prettify apache docs
 Key: IMPALA-11676
 URL: https://issues.apache.org/jira/browse/IMPALA-11676
 Project: IMPALA
  Issue Type: Improvement
  Components: Docs
Affects Versions: Impala 4.1.1
Reporter: Tamas Mate
Assignee: Tamas Mate
 Attachments: doc_style_v1.png

The current apache doc site is not that user friendly, we could make it a bit 
more accessible with a side navigation bar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11674) Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0

2022-10-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-11674:
---
Target Version: Impala 4.2.0

> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -
>
> Key: IMPALA-11674
> URL: https://issues.apache.org/jira/browse/IMPALA-11674
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Wenzhe Zhou
>Assignee: Riza Suminto
>Priority: Major
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384 
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0. 
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in 
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(), 
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The 
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0 
> and 0.16.0 to throw different exception for timeout. This cause 
> IsPeekTimeoutTException() and IsReadTimeoutTException() to return wrong value 
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to 
> rethrow the exception to caller TAcceptQueueServer::run() and make 
> TAcceptQueueServer::run() to close the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11675) Use DATE types in TPC-H schema

2022-10-19 Thread Gabor Kaszab (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620033#comment-17620033
 ] 

Gabor Kaszab commented on IMPALA-11675:
---

This has been on my radar for the last 2 years. :) Thanks for creating a 
ticket, [~stigahuang]

It would also be nice to measure if this change also brings some speedup for 
running our TPCH test suite.

> Use DATE types in TPC-H schema
> --
>
> Key: IMPALA-11675
> URL: https://issues.apache.org/jira/browse/IMPALA-11675
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Priority: Critical
>
> The TPC-H standard uses DATE type in some columns like o_orderdate, 
> l_shipdate, l_commitdate, l_receiptdate:
> https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf
> These columns are currently defined as string types in our tests. It'd be 
> nice to use DATE type instead.
> There is a discussion thread about swithing some date string columns to use 
> DATE type:
> https://lists.apache.org/thread/kpq5mz77zrkrrk3wydjr40j6rz58hh8g
> CC [~gaborkaszab], [~csringhofer], [~laszlog]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org