[jira] [Commented] (KUDU-1689) [Python] - Expose Table Creator API

2016-10-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564398#comment-15564398
 ] 

Todd Lipcon commented on KUDU-1689:
---

I'm curious on your thoughts here. I had originally thought that kwargs was 
more pythonic than "builder" style in Python. (builders are common in C++ and 
Java because they lack kwargs).

> [Python] - Expose Table Creator API
> ---
>
> Key: KUDU-1689
> URL: https://issues.apache.org/jira/browse/KUDU-1689
> Project: Kudu
>  Issue Type: New Feature
>  Components: python
>Affects Versions: 1.0.0
>Reporter: Jordan Birdsell
>Assignee: Jordan Birdsell
>Priority: Minor
>
> Currently the python client doesn't expose the TableCreator API. To keep the 
> table creation process extensible, this should be exposed so that we avoid 
> constantly adding parameters to the create_table method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1693) Flush write operations on per-TS basis and add corresponding limit on the buffer space

2016-10-10 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-1693:

Description: 
Currently, the Kudu C++ client buffers incoming operations regardless of their 
destination tablet server.  Accordingly, it's possible to set limit on the 
_total_ buffer space, not per tablet server.  This approach works but there is 
room for improvement: there are real-world scenarios where per-TS buffering 
would be more robust.  Besides, tablet servers impose limit on the RPC 
operations size.

Grouping write operations on per-tablet-server basis would be beneficial for 
'one-out-of-many lagging tablet server' scenario.  There, all tablet servers 
for a table perform well except for one which runs slow due to excessive IO, 
network issues, failing disk, etc.  The problem is that the lagging server 
hinders the overall performance.  This is due to the current approach to the 
buffer turnaround: a buffer is considered 'flushed' and its space is reclaimed 
at once when _all_ operations in the buffer are completed.  So, if 1000 
operations have already been sent but there is 1 operation still in progress, 
the whole buffer space is 'locked' and cannot be used.

Accordingly, introducing per-tablet-server buffer limit would help to address 
scenarios with concurrent writes into tables with extremely diverse partition 
factors (like 2 and 100).   E.g., consider a case when incoming write 
operations for tables with diverse partition factors are intermixed in the 
context of one session.  The problem is that setting the total buffer space 
limit high is beneficial for the writes into the table with many partitions 
(assuming those writes are evenly distributed across participating tablets), 
but it may be over the server-side's limit for max transaction size if those 
writes are targeted for the table with a few partitions.

  was:
Grouping write operations on per-tablet-server basis would be beneficial for 
'one-out-of-many lagging tablet server' scenario.  There, all tablet servers 
for a table perform good except for one which runs slow due to some reason 
(excessive IO, network issues, failing disk, etc.).  The problem is that the 
lagging server hinders buffers turnaround: a buffer is considered 'flushed' and 
its space is reclaimed at once when _all_ operations in the buffer are 
completed.  So, if 1000 operations are flushed but there is 1 operation still 
in progress, the whole buffer space is 'locked' and cannot be used.

Accordingly, introducing per-tablet-server buffer limit for write operations 
would help to address scenarios with concurrent writes into tables with very 
different partition factors (like 2 and 100).   E.g., the incoming operations 
for tables with very different partition factors are intermixed in the context 
of the same session.  The problem is that setting the total buffer space limit 
high is fine for the writes into the table with many partitions (assuming those 
writes are evenly distributed across participating tablets), but it may be over 
the server-side's limit for max transaction size if those writes are targeted 
for a table with a few partitions.


> Flush write operations on per-TS basis and add corresponding limit on the 
> buffer space
> --
>
> Key: KUDU-1693
> URL: https://issues.apache.org/jira/browse/KUDU-1693
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 1.0.0
>Reporter: Alexey Serbin
>
> Currently, the Kudu C++ client buffers incoming operations regardless of 
> their destination tablet server.  Accordingly, it's possible to set limit on 
> the _total_ buffer space, not per tablet server.  This approach works but 
> there is room for improvement: there are real-world scenarios where per-TS 
> buffering would be more robust.  Besides, tablet servers impose limit on the 
> RPC operations size.
> Grouping write operations on per-tablet-server basis would be beneficial for 
> 'one-out-of-many lagging tablet server' scenario.  There, all tablet servers 
> for a table perform well except for one which runs slow due to excessive IO, 
> network issues, failing disk, etc.  The problem is that the lagging server 
> hinders the overall performance.  This is due to the current approach to the 
> buffer turnaround: a buffer is considered 'flushed' and its space is 
> reclaimed at once when _all_ operations in the buffer are completed.  So, if 
> 1000 operations have already been sent but there is 1 operation still in 
> progress, the whole buffer space is 'locked' and cannot be used.
> Accordingly, introducing per-tablet-server buffer limit would help to address 
> scenarios with concurrent writes into tables with extremely diverse partition 
> factors (like 2 and 100). 

[jira] [Created] (KUDU-1693) Flush write operations on per-TS basis and add corresponding limit on the buffer space

2016-10-10 Thread Alexey Serbin (JIRA)
Alexey Serbin created KUDU-1693:
---

 Summary: Flush write operations on per-TS basis and add 
corresponding limit on the buffer space
 Key: KUDU-1693
 URL: https://issues.apache.org/jira/browse/KUDU-1693
 Project: Kudu
  Issue Type: Improvement
  Components: client
Affects Versions: 1.0.0
Reporter: Alexey Serbin


Grouping write operations on per-tablet-server basis would be beneficial for 
'one-out-of-many lagging tablet server' scenario.  There, all tablet servers 
for a table perform good except for one which runs slow due to some reason 
(excessive IO, network issues, failing disk, etc.).  The problem is that the 
lagging server hinders buffers turnaround: a buffer is considered 'flushed' and 
its space is reclaimed at once when _all_ operations in the buffer are 
completed.  So, if 1000 operations are flushed but there is 1 operation still 
in progress, the whole buffer space is 'locked' and cannot be used.

Accordingly, introducing per-tablet-server buffer limit for write operations 
would help to address scenarios with concurrent writes into tables with very 
different partition factors (like 2 and 100).   E.g., the incoming operations 
for tables with very different partition factors are intermixed in the context 
of the same session.  The problem is that setting the total buffer space limit 
high is fine for the writes into the table with many partitions (assuming those 
writes are evenly distributed across participating tablets), but it may be over 
the server-side's limit for max transaction size if those writes are targeted 
for a table with a few partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1692) Deleting large tablets causes a lot of tcmalloc contention

2016-10-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563979#comment-15563979
 ] 

Todd Lipcon commented on KUDU-1692:
---

This seems to hold some tcmalloc locks, as I saw a lot of log messages 
following the deletion like:

{code}
W1010 17:01:42.257256  4710 kernel_stack_watchdog.cc:144] Thread 5078 stuck at 
../../src/kudu/rpc/outbound_call.cc:185 for 154ms:
Kernel stack:
[] hrtimer_nanosleep+0xc4/0x180
[] sys_nanosleep+0x6e/0x80
[] system_call_fastpath+0x16/0x1b
[] 0x

User stack:
@   0x8655e8  base::internal::SpinLockDelay()
@   0x854ac7  SpinLock::SlowLock()
@   0x855777  tcmalloc::ThreadCache::IncreaseCacheLimit()
@  0x1b936d6  operator delete()
{code}

as well as many long reactor freezes (8+ seconds):
{code}
W1010 17:02:58.113458  5078 connection.cc:199] RPC call timeout handler was 
delayed by 8.05891s! This may be due to a process-wide pause such as swapping, 
logging-related delays, or allocator lock contention. Will allow an additional 
1.5608s for a response.
{code}

The pauses were bad enough that this caused lots of leader elections, write 
timeouts, etc. This went on for about a minute and a half before the servers 
became stable again.

> Deleting large tablets causes a lot of tcmalloc contention
> --
>
> Key: KUDU-1692
> URL: https://issues.apache.org/jira/browse/KUDU-1692
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet, util
>Affects Versions: 1.0.0
>Reporter: Todd Lipcon
>
> I deleted a large table which contained about 1TB of data per tablet server. 
> The tablet servers then started spending a large amount of time in this stack:
> {code}
>   855e94 tcmalloc::ThreadCache::GetThreadStats(unsigned 
> long*, unsigned long*) 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>   84e9ba ExtractStats(TCMallocStats*, unsigned long*, 
> tcmalloc::PageHeap::SmallSpanStats*, tcmalloc::PageHeap::LargeSpanStats*) 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-releas
>   850f8f TCMallocImplementation::GetNumericProperty(char 
> const*, unsigned long*) 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>  1a18c50 kudu::GetTCMallocCurrentAllocatedBytes() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>  1a19a50 kudu::MemTracker::UpdateConsumption() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>   980f01 std::_Sp_counted_ptr (__gnu_cxx::_Lock_policy)2>::_M_dispose() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>   99a937 kudu::tablet::CFileSet::~CFileSet() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>   99ad61 kudu::tablet::CFileSet::~CFileSet() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>   948b42 kudu::tablet::DiskRowSet::~DiskRowSet() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
>   965f35 kudu::tablet::RowSetTree::~RowSetTree() 
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1580) Connection negotiation timeout to tablet server is treated as unretriable error

2016-10-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563960#comment-15563960
 ] 

Todd Lipcon commented on KUDU-1580:
---

We also see this frequently in real cluster testing under load, especially 
since the negotiation timeout is 3sec by default.

> Connection negotiation timeout to tablet server is treated as unretriable 
> error
> ---
>
> Key: KUDU-1580
> URL: https://issues.apache.org/jira/browse/KUDU-1580
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.10.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> In the case that the leader tablet server is up but "frozen", the client will 
> get a connection negotiation timeout trying to establish an RPC connection. 
> It appears that this Status::TimeOut() is treated as a non-retriable error by 
> WriteRpc::AnalyzeResponse, so the client gets a failure even if there has 
> been a leader re-election within the client-provided deadline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1580) Connection negotiation timeout to tablet server is treated as unretriable error

2016-10-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-1580:
--
Priority: Critical  (was: Major)

> Connection negotiation timeout to tablet server is treated as unretriable 
> error
> ---
>
> Key: KUDU-1580
> URL: https://issues.apache.org/jira/browse/KUDU-1580
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.10.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> In the case that the leader tablet server is up but "frozen", the client will 
> get a connection negotiation timeout trying to establish an RPC connection. 
> It appears that this Status::TimeOut() is treated as a non-retriable error by 
> WriteRpc::AnalyzeResponse, so the client gets a failure even if there has 
> been a leader re-election within the client-provided deadline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1153) API should allow introspection of current split points

2016-10-10 Thread Matthew Jacobs (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563706#comment-15563706
 ] 

Matthew Jacobs commented on KUDU-1153:
--

This probably needs to be captured more generally once again, since now 
specifying explicit ranges is recommended in favor of split points. This is 
pretty important because Impala will support the new range syntax soon, and 
dropping partitions is mostly unusable if you cannot discover what partitions 
have been created via SHOW PARTITIONS.

> API should allow introspection of current split points
> --
>
> Key: KUDU-1153
> URL: https://issues.apache.org/jira/browse/KUDU-1153
> Project: Kudu
>  Issue Type: Improvement
>  Components: api, client, impala
>Affects Versions: Private Beta
>Reporter: Martin Grund
>
> Impala needs a way to do introspection on the partition schema to give the 
> use a reasonable "SHOW CREATE TABLE". For Hash bucketed tables this can be 
> achieved, but for range partitioned tables it is not possible.
> One solution would be to iterate over all the tablets and fetch the stop 
> partition keys and extract the range component, however, this would require 
> the PartitionSchema to have a method to decode a PartialRow from byte[].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)