Query: describe kudu_db.chr22_kudu
+-------------+--------+---------+
| name        | type   | comment |
+-------------+--------+---------+
| pos         | int    |         |
| id          | string |         |
| chrom       | string |         |
| ref         | string |         |
| alt         | string |         |
| qual        | string |         |
| filter      | string |         |
| info        | string |         |
| format_type | string |         |
| hg00096     | string |         |
| hg00097     | string |         |
| hg00099     | string |         |
| hg00100     | string |         |
| hg00101     | string |         |
| hg00102     | string |         |
| hg00103     | string |         |
| hg00104     | string |         |

..........

all the way to column na20828 string.

Each hg and na columns have values like:
| hg00096                    |
+----------------------------+
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |
| 0|0:0.000:0.00,-5.00,-5.00 |



On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote:

> What are the types of your 1000 columns? Maybe an even smaller batch size
> is necessary.
>
> -Todd
>
> On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> I have tried with batch_size=500 and still get same error. For your
>> reference are attached info that may help diagnose.
>>
>> Error: Error while applying Kudu session.: Incomplete: not enough space
>> remaining in buffer for op (required 46.7K, 7.00M already used
>>
>>
>> Config settings:
>>
>> Kudu Tablet Server Block Cache Capacity   1 GB
>> Kudu Tablet Server Hard Memory Limit  16 GB
>>
>>
>> On Wed, May 18, 2016 at 8:26 AM, William Berkeley <
>> wdberke...@cloudera.com> wrote:
>>
>>> Both options are more or less the same idea- the point is you need less
>>> rows going in per batch so you don't go over the batch size limit. Follow
>>> what Todd said as he explained it more clearly and suggested a better way.
>>>
>>> -Will
>>>
>>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <9000r...@gmail.com> wrote:
>>>
>>>> Thanks for the updates. I will give both options a try and report back.
>>>>
>>>> If you are interested in testing with such datasets, I can help.
>>>>
>>>> Thanks,
>>>>
>>>> Abhi
>>>>
>>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>>>
>>>>> Hi Abhi,
>>>>>
>>>>> Will is right that the error is client-side, and probably happening
>>>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>>>> overflow the max buffer size that Will mentioned. This seems quite 
>>>>> probable
>>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes 
>>>>> each).
>>>>>
>>>>> I don't think his suggested workaround will help, but you can try
>>>>> running 'set batch_size=500' before running the create table or insert
>>>>> query.
>>>>>
>>>>> In terms of max supported columns, most of the workloads we are
>>>>> focusing on are more like typical data-warehouse tables, on the order of a
>>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted
>>>>> territory" where it's much more likely you'll hit problems like this and
>>>>> quite possibly others as well. Will be interested to hear your 
>>>>> experiences,
>>>>> though you should probably be prepared for some rough edges.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>>>> wdberke...@cloudera.com> wrote:
>>>>>
>>>>>> Hi Abhi.
>>>>>>
>>>>>> I believe that error is actually coming from the client, not the
>>>>>> server. See e,g,
>>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787
>>>>>>  (NB
>>>>>> that link is to master branch not the exact release you are using).
>>>>>>
>>>>>> If you look around there, you'll see that the max is set by something
>>>>>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>>>>>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>>>>>
>>>>>> I think the simple workaround would be to do the CTAS as a CTAS +
>>>>>> insert as select. Pick a condition that bipartitions the table, so you
>>>>>> don't get errors trying to double insert rows.
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <9000r...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> What is the limit of columns in Kudu?
>>>>>>>
>>>>>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>>>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. 
>>>>>>> I am
>>>>>>> trying to create a new Kudu table as select from that table. I get the
>>>>>>> following error:
>>>>>>>
>>>>>>> Error while applying Kudu session.: Incomplete: not enough space
>>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>>>>>
>>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see
>>>>>>> the following. What configuration needs to be tweaked?
>>>>>>>
>>>>>>>
>>>>>>> Memory usage by subsystem
>>>>>>> IdParentLimitCurrent ConsumptionPeak consumption
>>>>>>> root none 50.12G 4.97M 6.08M
>>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>>>>>> code_cache-sharded_lru_cache root none 1B 1B
>>>>>>> server root none 2.3K 201.4K
>>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K
>>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B
>>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K
>>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B
>>>>>>> 87.8K
>>>>>>> log_block_manager server none 1.8K 2.7K
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Abhi Basu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Abhi Basu
>>>>
>>>
>>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Abhi Basu

Reply via email to