Todd:

Thanks for the update. So Kudu is not designed to be a common storage
system for long-term and streaming data/random access? Just curious.

On Wed, May 18, 2016 at 3:38 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Hm, so each of the strings is about 27 bytes, so each row is 27KB.  So, a
> batch size of 500 is still >13MB. I'd start with something very low like
> 10, and work your way up. That said, this is definitely not in the
> "standard" use cases for which Kudu has been designed.
>
> I'd also recommend using compression and/or dictionary coding for a table
> if you have many repeat values. Unfortunately, it's not currently do this
> when creating a table using Impala.
>
> -Todd
>
> On Wed, May 18, 2016 at 10:51 AM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> Query: describe kudu_db.chr22_kudu
>> +-------------+--------+---------+
>> | name        | type   | comment |
>> +-------------+--------+---------+
>> | pos         | int    |         |
>> | id          | string |         |
>> | chrom       | string |         |
>> | ref         | string |         |
>> | alt         | string |         |
>> | qual        | string |         |
>> | filter      | string |         |
>> | info        | string |         |
>> | format_type | string |         |
>> | hg00096     | string |         |
>> | hg00097     | string |         |
>> | hg00099     | string |         |
>> | hg00100     | string |         |
>> | hg00101     | string |         |
>> | hg00102     | string |         |
>> | hg00103     | string |         |
>> | hg00104     | string |         |
>>
>> ..........
>>
>> all the way to column na20828 string.
>>
>> Each hg and na columns have values like:
>> | hg00096                    |
>> +----------------------------+
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>> | 0|0:0.000:0.00,-5.00,-5.00 |
>>
>>
>>
>> On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>>> What are the types of your 1000 columns? Maybe an even smaller batch
>>> size is necessary.
>>>
>>> -Todd
>>>
>>> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <9000r...@gmail.com> wrote:
>>>
>>>> I have tried with batch_size=500 and still get same error. For your
>>>> reference are attached info that may help diagnose.
>>>>
>>>> Error: Error while applying Kudu session.: Incomplete: not enough space
>>>> remaining in buffer for op (required 46.7K, 7.00M already used
>>>>
>>>>
>>>> Config settings:
>>>>
>>>> Kudu Tablet Server Block Cache Capacity   1 GB
>>>> Kudu Tablet Server Hard Memory Limit  16 GB
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>>>> wdberke...@cloudera.com> wrote:
>>>>
>>>>> Both options are more or less the same idea- the point is you need
>>>>> less rows going in per batch so you don't go over the batch size limit.
>>>>> Follow what Todd said as he explained it more clearly and suggested a
>>>>> better way.
>>>>>
>>>>> -Will
>>>>>
>>>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <9000r...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the updates. I will give both options a try and report
>>>>>> back.
>>>>>>
>>>>>> If you are interested in testing with such datasets, I can help.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Abhi
>>>>>>
>>>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <t...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Abhi,
>>>>>>>
>>>>>>> Will is right that the error is client-side, and probably happening
>>>>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that 
>>>>>>> will
>>>>>>> overflow the max buffer size that Will mentioned. This seems quite 
>>>>>>> probable
>>>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes 
>>>>>>> each).
>>>>>>>
>>>>>>> I don't think his suggested workaround will help, but you can try
>>>>>>> running 'set batch_size=500' before running the create table or insert
>>>>>>> query.
>>>>>>>
>>>>>>> In terms of max supported columns, most of the workloads we are
>>>>>>> focusing on are more like typical data-warehouse tables, on the order 
>>>>>>> of a
>>>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>>>> territory" where it's much more likely you'll hit problems like this and
>>>>>>> quite possibly others as well. Will be interested to hear your 
>>>>>>> experiences,
>>>>>>> though you should probably be prepared for some rough edges.
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>>>> wdberke...@cloudera.com> wrote:
>>>>>>>
>>>>>>>> Hi Abhi.
>>>>>>>>
>>>>>>>> I believe that error is actually coming from the client, not the
>>>>>>>> server. See e,g,
>>>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787
>>>>>>>>  (NB
>>>>>>>> that link is to master branch not the exact release you are using).
>>>>>>>>
>>>>>>>> If you look around there, you'll see that the max is set by
>>>>>>>> something called max_buffer_size_, which appears to be hardcoded to 7 *
>>>>>>>> 1024 * 1024 bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 
>>>>>>>> 7).
>>>>>>>>
>>>>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>>>>> don't get errors trying to double insert rows.
>>>>>>>>
>>>>>>>> -Will
>>>>>>>>
>>>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <9000r...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What is the limit of columns in Kudu?
>>>>>>>>>
>>>>>>>>> I am using 1000 gen dataset, specifically the chr22 table which
>>>>>>>>> has 500,000 rows x 1101 columns. This table has been built In 
>>>>>>>>> Impala/HDFS.
>>>>>>>>> I am trying to create a new Kudu table as select from that table. I 
>>>>>>>>> get the
>>>>>>>>> following error:
>>>>>>>>>
>>>>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>>>>
>>>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I
>>>>>>>>> see the following. What configuration needs to be tweaked?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Memory usage by subsystem
>>>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>>>> server root none 2.3K 201.4K
>>>>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B
>>>>>>>>> 28.5K
>>>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>>>>> 87.8K
>>>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> --
>>>>>>>>> Abhi Basu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Abhi Basu
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Abhi Basu
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Abhi Basu

Reply via email to