[jira] [Created] (IMPALA-6652) KRPC : Data Stream Manager Deferred RPCs in memz page isn't correct

2018-03-13 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-6652:
---

 Summary: KRPC : Data Stream Manager Deferred RPCs in memz page 
isn't correct
 Key: IMPALA-6652
 URL: https://issues.apache.org/jira/browse/IMPALA-6652
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 2.11.0
Reporter: Mostafa Mokhtar
Assignee: Lars Volker


While loading data into a Kudu table against the latest Impala 2.11.0 I noticed 
that "Data Stream Manager Deferred RPCs" in the memz isn't accurate.
 
>From memz on worker

{code}
Process: Limit=201.73 GB Total=85.41 GB Peak=85.41 GB Buffer Pool: Free 
Buffers: Total=43.64 MB Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused 
Reservation: Total=-17.84 MB Data Stream Service Queue: Limit=10.09 GB Total=0 
Peak=512.97 MB Data Stream Manager Deferred RPCs: Total=0 Peak=0 TCMalloc 
Overhead: Total=124.07 MB Free Disk IO Buffers: Total=984.97 MB Peak=984.97 MB 
RequestPool=root.default: Total=83.92 GB Peak=83.92 GB 
Query(844a0200d7876345:20bb38b9): Reservation=70.44 GB 
ReservationLimit=161.39 GB OtherMemory=13.48 GB Total=83.92 GB Peak=83.92 GB 
Fragment 844a0200d7876345:20bb38b900a3: Reservation=70.44 GB 
OtherMemory=38.08 MB Total=70.47 GB Peak=70.47 GB SORT_NODE (id=2): 
Reservation=70.44 GB OtherMemory=8.00 KB Total=70.44 GB Peak=70.44 GB 
EXCHANGE_NODE (id=1): Reservation=18.06 MB OtherMemory=0 Total=18.06 MB 
Peak=19.53 MB KrpcDeferredRpcs: Total=0 Peak=1.47 MB KuduTableSink: Total=20.00 
MB Peak=20.00 MB CodeGen: Total=438.00 B Peak=306.00 KB Fragment 
844a0200d7876345:20bb38b90022: Reservation=0 OtherMemory=13.44 GB 
Total=13.44 GB Peak=13.97 GB HDFS_SCAN_NODE (id=0): Total=13.44 GB Peak=13.97 
GB KrpcDataStreamSender (dst_id=1): Total=2.57 MB Peak=3.61 MB CodeGen: 
Total=234.00 B Peak=52.50 KB Untracked Memory: Total=389.18 MB
{code}
 
And snapshot from query profile
 
{code}
Instance 844a0200d7876345:20bb38b900a3 
(host=va1030.halxg.cloudera.com:22000):(Total: 1s172ms, non-child: 200.411ms, % 
non-child: 17.09%) Fragment Instance Lifecycle Event Timeline: 1s173ms - 
Prepare Finished: 199.691ms (199.691ms) - Open Finished: 1s173ms (973.902ms) 
MemoryUsage(1m4s): 4.77 GB, 13.21 GB, 19.60 GB, 23.70 GB, 26.67 GB, 29.21 GB, 
31.50 GB, 33.63 GB, 35.40 GB, 37.14 GB, 38.54 GB, 39.79 GB, 41.09 GB, 42.37 GB, 
43.60 GB, 44.80 GB, 45.95 GB, 47.01 GB, 48.09 GB, 49.17 GB, 50.22 GB, 51.21 GB, 
52.40 GB, 53.46 GB, 54.58 GB, 55.61 GB, 56.58 GB, 57.53 GB, 58.45 GB, 59.39 GB, 
60.31 GB, 61.20 GB, 62.12 GB, 63.04 GB, 64.15 GB, 65.11 GB, 66.15 GB, 67.06 GB, 
67.87 GB, 68.66 GB, 69.49 GB ThreadUsage(1m4s): 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1 - AverageThreadTokens: 1.00 - BloomFilterBytes: 0 - 
PeakMemoryUsage: 70.47 GB (75663229366) - PeakReservation: 70.43 GB 
(75623301120) - PeakUsedReservation: 0 - PerHostPeakMemUsage: 83.91 GB 
(90098285809) - RowsProduced: 0 (0) - TotalNetworkReceiveTime: 34m43s - 
TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 0.000ns - 
TotalThreadsInvoluntaryContextSwitches: 7 (7) - TotalThreadsTotalWallClockTime: 
973.873ms - TotalThreadsSysTime: 2.000ms - TotalThreadsUserTime: 55.991ms - 
TotalThreadsVoluntaryContextSwitches: 25 (25) Buffer pool: - AllocTime: 0.000ns 
- CumulativeAllocationBytes: 0 - CumulativeAllocations: 0 (0) - 
PeakReservation: 0 - PeakUnpinnedBytes: 0 - PeakUsedReservation: 0 - 
ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - ReservationLimit: 
0 - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Fragment 
Instance Lifecycle Timings: - ExecTime: 0.000ns - ExecTreeExecTime: 0.000ns - 
OpenTime: 973.876ms - ExecTreeOpenTime: 915.567ms - PrepareTime: 198.988ms - 
ExecTreePrepareTime: 155.134us KuduTableSink:(Total: 12.589us, non-child: 
12.589us, % non-child: 100.00%) - KuduApplyTimer: 0.000ns - NumRowErrors: 0 (0) 
- PeakMemoryUsage: 20.00 MB (20971520) - RowsProcessedRate: 0 - TotalNumRows: 0 
(0) SORT_NODE (id=2):(Total: 915.718ms, non-child: 0.000ns, % non-child: 0.00%) 
SortType: Partial ExecOption: Codegen Enabled - NumRowsPerRun: 0 (0) (Number of 
samples: 0) - InMemorySortTime: 0.000ns - PeakMemoryUsage: 70.43 GB 
(75623309312) - RowsReturned: 0 (0) - RowsReturnedRate: 0 - RunsCreated: 1 (1) 
- SortDataSize: 0 Buffer pool: - AllocTime: 2m47s - CumulativeAllocationBytes: 
70.43 GB (75623301120) - CumulativeAllocations: 36.06K (36060) - 
PeakReservation: 70.43 GB (75623301120) - PeakUnpinnedBytes: 0 - 
PeakUsedReservation: 70.43 GB (75623301120) - ReadIoBytes: 0 - ReadIoOps: 0 (0) 
- ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - 
WriteIoWaitTime: 0.000ns EXCHANGE_NODE (id=1):(Total: 34m53s, non-child: 
17s052ms, % non-child: 0.81%) - ConvertRowBatchTime: 7s479ms - PeakMemoryUsage: 
19.53 MB (20481319) - RowsReturned: 276.19M (276187128) 

[jira] [Moved] (IMPALA-6653) Unicode support for Kudu table names

2018-03-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon moved KUDU-2340 to IMPALA-6653:
---

Workflow: jira  (was: Kudu Workflow)
 Key: IMPALA-6653  (was: KUDU-2340)
 Project: IMPALA  (was: Kudu)

> Unicode support for Kudu table names
> 
>
> Key: IMPALA-6653
> URL: https://issues.apache.org/jira/browse/IMPALA-6653
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Jim Halfpenny
>Priority: Major
>
> It is possible to create a Kudu table containing unicode characters in its in 
> Impala by specifying the kudu.table_name attribute. When trying to select 
> from this table you receive an error that the underlying table does not exist.
> The example below shows a table being created successfully, but failing on a 
> select * statement.
> {{[jh-kafka-2:21000] > create table test2( a int primary key) stored as kudu 
> TBLPROPERTIES('kudu.table_name' = 'impala::kudutest.');}}
> {{Query: create table test2( a int primary key) stored as kudu 
> TBLPROPERTIES('kudu.table_name' = 'impala::kudutest.')}}
> {{WARNINGS: Unpartitioned Kudu tables are inefficient for large data 
> sizes.}}{{Fetched 0 row(s) in 0.64s}}
> {{[jh-kafka-2:21000] > select * from test2;}}
> {{Query: select * from test2}}
> {{Query submitted at: 2018-03-13 08:23:29 (Coordinator: 
> https://jh-kafka-2:25000)}}
> {{ERROR: AnalysisException: Failed to load metadata for table: 'test2'}}
> {{CAUSED BY: TableLoadingException: Error loading metadata for Kudu table 
> impala::kudutest.}}
> {{CAUSED BY: ImpalaRuntimeException: Error opening Kudu table 
> 'impala::kudutest.', Kudu error: The table does not exist: table_name: 
> "impala::kudutest."}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-4238) custom_cluster/test_client_ssl.py TestClientSsl.test_ssl AssertionError: SIGINT was not caught by shell within 30s

2018-03-13 Thread Sailesh Mukil (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailesh Mukil resolved IMPALA-4238.
---
Resolution: Duplicate

> custom_cluster/test_client_ssl.py TestClientSsl.test_ssl AssertionError: 
> SIGINT was not caught by shell within 30s
> --
>
> Key: IMPALA-4238
> URL: https://issues.apache.org/jira/browse/IMPALA-4238
> Project: IMPALA
>  Issue Type: Bug
>  Components: Security
>Affects Versions: Impala 2.8.0, Impala 2.10.0
>Reporter: Harrison Sheinblatt
>Assignee: Sailesh Mukil
>Priority: Major
>  Labels: flaky
>
> asf master core test failure: 
> http://sandbox.jenkins.sf.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core/540/
> http://sandbox.jenkins.sf.cloudera.com/job/impala-umbrella-build-and-test/4921/console
> {noformat}
> 08:18:51 === FAILURES 
> ===
> 08:18:51  TestClientSsl.test_ssl[exec_option: {'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 
> 'num_nodes': 0} | table_format: text/none] 
> 08:18:51 
> 08:18:51 self = 
> 08:18:51 vector = 
> 08:18:51 
> 08:18:51 @pytest.mark.execute_serially
> 08:18:51 
> @CustomClusterTestSuite.with_args("--ssl_server_certificate=%s/server-cert.pem
>  "
> 08:18:51   
> "--ssl_private_key=%s/server-key.pem"
> 08:18:51   % (CERT_DIR, CERT_DIR))
> 08:18:51 def test_ssl(self, vector):
> 08:18:51 
> 08:18:51   self._verify_negative_cases()
> 08:18:51   # TODO: This is really two different tests, but the custom 
> cluster takes too long to
> 08:18:51   # start. Make it so that custom clusters can be specified 
> across test suites.
> 08:18:51   self._validate_positive_cases("%s/server-cert.pem" % 
> self.CERT_DIR)
> 08:18:51 
> 08:18:51   # No certificate checking: will accept any cert.
> 08:18:51   self._validate_positive_cases()
> 08:18:51 
> 08:18:51   # Test cancelling a query
> 08:18:51   impalad = ImpaladService(socket.getfqdn())
> 08:18:51   impalad.wait_for_num_in_flight_queries(0)
> 08:18:51   p = ImpalaShell(args="--ssl")
> 08:18:51   p.send_cmd("SET DEBUG_ACTION=0:OPEN:WAIT")
> 08:18:51   p.send_cmd("select count(*) from functional.alltypes")
> 08:18:51   impalad.wait_for_num_in_flight_queries(1)
> 08:18:51 
> 08:18:51   LOG = logging.getLogger('test_client_ssl')
> 08:18:51   LOG.info("Cancelling query")
> 08:18:51   num_tries = 0
> 08:18:51   # In practice, sending SIGINT to the shell process doesn't 
> always seem to get caught
> 08:18:51   # (and a search shows up some bugs in Python where SIGINT 
> might be ignored). So retry
> 08:18:51   # for 30s until one signal takes.
> 08:18:51   while impalad.get_num_in_flight_queries() == 1:
> 08:18:51 time.sleep(1)
> 08:18:51 LOG.info("Sending signal...")
> 08:18:51 os.kill(p.pid(), signal.SIGINT)
> 08:18:51 num_tries += 1
> 08:18:51 >   assert num_tries < 30, "SIGINT was not caught by shell 
> within 30s"
> 08:18:51 E   AssertionError: SIGINT was not caught by shell within 30s
> 08:18:51 E   assert 30 < 30
> 08:18:51 
> 08:18:51 custom_cluster/test_client_ssl.py:85: AssertionError
> 08:18:51  Captured stdout setup 
> -
> 08:18:51 Starting State Store logging to 
> /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 08:18:51 Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 08:18:51 Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 08:18:51 Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 08:18:51 Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 08:18:51 Waiting for Catalog... Status: 53 DBs / 1077 tables (ready=True)
> 08:18:51 Waiting for Catalog... Status: 53 DBs / 1077 tables (ready=True)
> 08:18:51 Waiting for Catalog... Status: 53 DBs / 1077 tables (ready=True)
> 08:18:51 Impala Cluster Running with 3 nodes.
> 08:18:51  Captured stderr setup 
> -
> 08:18:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:18:51 MainThread: 

[jira] [Resolved] (IMPALA-6638) File handle cache shows contention when cold

2018-03-13 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6638.
---
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

> File handle cache shows contention when cold
> 
>
> Key: IMPALA-6638
> URL: https://issues.apache.org/jira/browse/IMPALA-6638
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 2.12.0
>
>
> Performance tests show that when the file handle cache is cold, there is 
> contention on the file handle cache partition lock. This added contention is 
> particularly severe when there are multiple IO threads accessing the same 
> file (e.g. when there is a query accessing multiple Parquet columns). This is 
> because the IO threads all map to the same partition because they are 
> accessing the same file.
> The contention is due to the fact that FileHandleCache::GetFileHandle() holds 
> the lock while it opens the file handle. This lengthens the critical section 
> considerably, because opening a file handle involves network traffic to the 
> NameNode. This contention does not exist when the cache is hot.
> FileHandleCache::GetFileHandle() should drop the lock while it is opening the 
> file handle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6655) Set owner information on database creation

2018-03-13 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-6655:


 Summary: Set owner information on database creation
 Key: IMPALA-6655
 URL: https://issues.apache.org/jira/browse/IMPALA-6655
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Fredy Wijaya
Assignee: Fredy Wijaya


Currently Impala only shows owner information when using DESCRIBE DATABASE 
EXTENDED  for databases created outside Impala. When a database is created 
inside Impala, the owner information is never set. For table creation, Impala 
always sets the owner information which can be shown by using DESCRIBE EXTENDED 
. To make the behavior consistent, we need to set the owner information on 
database creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6654) [DOCS] Kudu/Sentry docs are out of date

2018-03-13 Thread Thomas Tauber-Marshall (JIRA)
Thomas Tauber-Marshall created IMPALA-6654:
--

 Summary: [DOCS] Kudu/Sentry docs are out of date
 Key: IMPALA-6654
 URL: https://issues.apache.org/jira/browse/IMPALA-6654
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Affects Versions: Impala 2.11.0
Reporter: Thomas Tauber-Marshall


The documentation of Impala's support for Sentry authorization on Kudu tables, 
available here:
http://impala.apache.org/docs/build/html/topics/impala_kudu.html

is out of date. It should be updated to include the changes made in 
IMPALA-5489. In particular:
- Access is no longer "all or nothing" - we support column-level permissions
- Permissions do not apply "to all SQL operations" - we support SELECT- and 
INSERT-specific permissions. DELETE/UPDATE/UPSERT still require ALL

We should also document that "all on server" is required to specify 
"kudu.master_addresses" in a CREATE, even for managed tables, in addition to be 
required to CREATE any external table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6449) Use CLOCK_MONOTONIC in ConditonVariable

2018-03-13 Thread Michael Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-6449.

   Resolution: Fixed
Fix Version/s: Impala 2.12.0
   Impala 3.0

https://github.com/apache/impala/commit/30c0375ed358f8040d28fe756a17c6e3965177b1

IMPALA-6449: Use CLOCK_MONOTONIC in ConditionVariable

ConditionVariable is a thin wrapper around pthread_cond_*.
Currently, pthread_cond_timedwait() uses the default attribute
CLOCK_REALTIME. This is susceptible to adjustment to the system
clock from various sources such as NTP and time may go backward.
This change fixes the problem by switching to using CLOCK_MONOTONIC
so time will be monotonic although the frequency of the clock ticks
may still be adjusted by NTP. Ideally, we should use CLOCK_MONOTONIC_RAW
but it's available only on Linux kernel 2.6.28 or latter. This change
also get rids of some usage of boost::get_system_time() which suffers
from the same problem.

Change-Id: I81611cfd5e7c5347203fe7fa6b0f615602257f87
Reviewed-on: http://gerrit.cloudera.org:8080/9158
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins

> Use CLOCK_MONOTONIC in ConditonVariable
> ---
>
> Key: IMPALA-6449
> URL: https://issues.apache.org/jira/browse/IMPALA-6449
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, 
> Impala 2.11.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Minor
> Fix For: Impala 3.0, Impala 2.12.0
>
>
> There are various places in the code which call 
> {{ConditionVariable::WaitUntil()}} or {{ConditionVariable::WaitFor()}} with a 
> time computed from {{boost::get_system_time()}}.
> {noformat}
>   template 
>   bool WaitFor(boost::unique_lock& lock,
>   const duration_type& wait_duration) {
> return WaitUntil(lock, to_timespec(boost::get_system_time() + 
> wait_duration));
>   }
> {noformat}
> blocking-queue.h:
> {noformat}
>   template 
>   bool BlockingPutWithTimeout(V&& val, int64_t timeout_micros) {
> MonotonicStopWatch timer;
> boost::unique_lock write_lock(put_lock_);
> boost::system_time wtime = boost::get_system_time() +
> boost::posix_time::microseconds(timeout_micros);
> {noformat}
> thrift-server.cc:
> {noformat}
>   system_time deadline = get_system_time() +
>   
> posix_time::milliseconds(ThriftServer::ThriftServerEventProcessor::TIMEOUT_MS);
>   // Loop protects against spurious wakeup. Locks provide necessary fences to 
> ensure
>   // visibility.
>   while (!signal_fired_) {
> // Yields lock and allows supervision thread to continue and signal
> if (!signal_cond_.WaitUntil(lock, deadline)) {
> {noformat}
> The above are susceptible to clock adjustment from various sources such as 
> NTP. We should switch to using {{clock_gettime(CLOCK_MONOTONIC, ...)}} for 
> such elapsed time measurement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6624) Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c

2018-03-13 Thread Michael Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-6624.

   Resolution: Fixed
Fix Version/s: Impala 2.12.0
   Impala 3.0

Fixed at 
[https://github.com/apache/impala/commit/8079cd9d2a87051f81a41910b74fab15e35f36ea]

KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()

Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.

The following sequence becomes problematic with the previous implementation
of OutboundTransfer::TransferStarted():

- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
  Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
  and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
  hasn't really started yet and "car->call" is NULL due to cancellation, it
  removes it from the outbound transfer queue and moves on to the next entry
  in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
  eventually calls SSL_write() with a different buffer than expected by
  SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.

This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.

Confirmed the problem above is fixed by running stress test in a 130 node
cluster with Impala. The problem happened consistently without the fix.

Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Reviewed-on: http://gerrit.cloudera.org:8080/9587
Reviewed-by: Sailesh Mukil 
Reviewed-by: Todd Lipcon 
Tested-by: Todd Lipcon 
Reviewed-on: http://gerrit.cloudera.org:8080/9606
Tested-by: Impala Public Jenkins

> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c
> -
>
> Key: IMPALA-6624
> URL: https://issues.apache.org/jira/browse/IMPALA-6624
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
> Fix For: Impala 3.0, Impala 2.12.0
>
>
> During stress testing in a secure 140 node cluster, Impalad ran into the 
> following errors. This is supposed to be fixed in KUDU-2218. The fix for 
> KUDU-2218 has already been cherry-picked to Impala code base at this 
> [commit|https://github.com/apache/impala/commit/678bf28e233e667b05585110422762614840bdc2]
>  and the build should have this commit. It's unclear if Impala may be missing 
> other commits or the issue in KUDU-2218 is not completely fixed.
> Assigning to [~sailesh] to lead the investigation. Please feel free to 
> reassign to me if you are swamped Sailesh.
> {noformat}
> W0307 03:31:04.512100 158268 connection.cc:659] client connection to 
> 10.17.221.47:27000 send error: Network error: failed to write to TLS socket: 
> error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> W0307 03:31:04.524086 158268 connection.cc:153] Shutting down client 
> connection to 10.17.221.47:27000 with pending inbound data (11/16 bytes 
> received, last active 0 ns ago, status=Network error: failed to write to TLS 
> socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad writ
> e retry:s3_pkt.c:874)
> E0307 03:31:04.535635 123156 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> W0307 03:31:04.536145 158268 connection.cc:190] Error closing socket: Network 
> error: TlsSocket::Close: Success
> E0307 03:31:04.584370 140087 krpc-data-stream-sender.cc:335] channel send to 
> 10.17.221.47:27000 failed: TransmitData() to 10.17.221.47:27000 failed: 
> Network error: failed to write to TLS socket: error:1409F07F:SSL 
> routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:874
> I0307 03:31:04.697773 158412 rpcz_store.cc:255] Call 
> 

[jira] [Created] (IMPALA-6656) Metrics for time spent in BufferAllocator

2018-03-13 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-6656:
-

 Summary: Metrics for time spent in BufferAllocator
 Key: IMPALA-6656
 URL: https://issues.apache.org/jira/browse/IMPALA-6656
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong
Assignee: Tim Armstrong


We should track the total time spent and the time spent in TCMalloc so we can 
understand where time is going globally. 

I think we should shard these metrics across the arenas so we can see if the 
problem is just per-arena, and also to avoid contention between threads when 
updating the metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6657) Investigate why memory allocation in Exchange receiver node takes a long time

2018-03-13 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-6657:
---

 Summary: Investigate why memory allocation in Exchange receiver 
node takes a long time
 Key: IMPALA-6657
 URL: https://issues.apache.org/jira/browse/IMPALA-6657
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.11.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Attachments: Impala query profile.txt

It was observed while inserting large amounts of data into a Kudu table 
Exchange operator was running slow, query profile showed a big portion of the 
time was spent in memory allocation in the buffer pool

{code}
  EXCHANGE_NODE (id=1):(Total: 5h53m, non-child: 48s853ms, % non-child: 
0.23%)
   - ConvertRowBatchTime: 20s289ms
   - PeakMemoryUsage: 19.53 MB (20483562)
   - RowsReturned: 575.30M (575298780)
   - RowsReturnedRate: 27.10 K/sec
  Buffer pool:
 - AllocTime: 2h53m
 - CumulativeAllocationBytes: 261.26 GB (280526643200)
 - CumulativeAllocations: 13.70M (13697590)
 - PeakReservation: 18.06 MB (18939904)
 - PeakUnpinnedBytes: 0
 - PeakUsedReservation: 18.06 MB (18939904)
 - ReadIoBytes: 0
 - ReadIoOps: 0 (0)
 - ReadIoWaitTime: 0.000ns
 - WriteIoBytes: 0
 - WriteIoOps: 0 (0)
 - WriteIoWaitTime: 0.000ns
  RecvrSide:
BytesReceived(8m32s): 20.91 GB, 37.03 GB, 45.62 GB, 53.22 GB, 60.17 
GB, 66.30 GB, 71.60 GB, 76.59 GB, 81.36 GB, 86.03 GB, 90.35 GB, 94.30 GB, 98.17 
GB, 101.98 GB, 105.58 GB, 109.08 GB, 112.33 GB, 115.47 GB, 118.45 GB, 121.30 
GB, 124.09 GB, 126.74 GB, 129.26 GB, 131.88 GB, 134.41 GB, 136.85 GB, 139.32 
GB, 141.77 GB, 144.23 GB, 146.71 GB, 148.26 GB, 148.29 GB, 148.29 GB, 148.29 
GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 
GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 
GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 
GB, 148.30 GB
 - FirstBatchArrivalWaitTime: 1s071ms
 - TotalBytesReceived: 148.30 GB (159234237617)
 - TotalGetBatchTime: 5h53m
   - DataArrivalTimer: 5h52m
  SenderSide:
 - DeserializeRowBatchTime: 3h4m
 - NumBatchesArrived: 6.85M (6848795)
 - NumBatchesDeferred: 99.67K (99667)
 - NumBatchesEnqueued: 6.85M (6848795)
 - NumBatchesReceived: 6.85M (6848795)
 - NumEarlySenders: 0 (0)
 - NumEosReceived: 0 (0)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)