Query: describe kudu_db.chr22_kudu +-------------+--------+---------+ | name | type | comment | +-------------+--------+---------+ | pos | int | | | id | string | | | chrom | string | | | ref | string | | | alt | string | | | qual | string | | | filter | string | | | info | string | | | format_type | string | | | hg00096 | string | | | hg00097 | string | | | hg00099 | string | | | hg00100 | string | | | hg00101 | string | | | hg00102 | string | | | hg00103 | string | | | hg00104 | string | |
.......... all the way to column na20828 string. Each hg and na columns have values like: | hg00096 | +----------------------------+ | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | | 0|0:0.000:0.00,-5.00,-5.00 | On Wed, May 18, 2016 at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote: > What are the types of your 1000 columns? Maybe an even smaller batch size > is necessary. > > -Todd > > On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <9000r...@gmail.com> wrote: > >> I have tried with batch_size=500 and still get same error. For your >> reference are attached info that may help diagnose. >> >> Error: Error while applying Kudu session.: Incomplete: not enough space >> remaining in buffer for op (required 46.7K, 7.00M already used >> >> >> Config settings: >> >> Kudu Tablet Server Block Cache Capacity 1 GB >> Kudu Tablet Server Hard Memory Limit 16 GB >> >> >> On Wed, May 18, 2016 at 8:26 AM, William Berkeley < >> wdberke...@cloudera.com> wrote: >> >>> Both options are more or less the same idea- the point is you need less >>> rows going in per batch so you don't go over the batch size limit. Follow >>> what Todd said as he explained it more clearly and suggested a better way. >>> >>> -Will >>> >>> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <9000r...@gmail.com> wrote: >>> >>>> Thanks for the updates. I will give both options a try and report back. >>>> >>>> If you are interested in testing with such datasets, I can help. >>>> >>>> Thanks, >>>> >>>> Abhi >>>> >>>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon <t...@cloudera.com> wrote: >>>> >>>>> Hi Abhi, >>>>> >>>>> Will is right that the error is client-side, and probably happening >>>>> because your rows are so wide.Impala typically will batch 1000 rows at a >>>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will >>>>> overflow the max buffer size that Will mentioned. This seems quite >>>>> probable >>>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes >>>>> each). >>>>> >>>>> I don't think his suggested workaround will help, but you can try >>>>> running 'set batch_size=500' before running the create table or insert >>>>> query. >>>>> >>>>> In terms of max supported columns, most of the workloads we are >>>>> focusing on are more like typical data-warehouse tables, on the order of a >>>>> couple hundred columns. Crossing into the 1000+ range enters "uncharted >>>>> territory" where it's much more likely you'll hit problems like this and >>>>> quite possibly others as well. Will be interested to hear your >>>>> experiences, >>>>> though you should probably be prepared for some rough edges. >>>>> >>>>> -Todd >>>>> >>>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley < >>>>> wdberke...@cloudera.com> wrote: >>>>> >>>>>> Hi Abhi. >>>>>> >>>>>> I believe that error is actually coming from the client, not the >>>>>> server. See e,g, >>>>>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787 >>>>>> (NB >>>>>> that link is to master branch not the exact release you are using). >>>>>> >>>>>> If you look around there, you'll see that the max is set by something >>>>>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024 >>>>>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7). >>>>>> >>>>>> I think the simple workaround would be to do the CTAS as a CTAS + >>>>>> insert as select. Pick a condition that bipartitions the table, so you >>>>>> don't get errors trying to double insert rows. >>>>>> >>>>>> -Will >>>>>> >>>>>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <9000r...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> What is the limit of columns in Kudu? >>>>>>> >>>>>>> I am using 1000 gen dataset, specifically the chr22 table which has >>>>>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. >>>>>>> I am >>>>>>> trying to create a new Kudu table as select from that table. I get the >>>>>>> following error: >>>>>>> >>>>>>> Error while applying Kudu session.: Incomplete: not enough space >>>>>>> remaining in buffer for op (required 46.7K, 6.96M already used >>>>>>> >>>>>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see >>>>>>> the following. What configuration needs to be tweaked? >>>>>>> >>>>>>> >>>>>>> Memory usage by subsystem >>>>>>> IdParentLimitCurrent ConsumptionPeak consumption >>>>>>> root none 50.12G 4.97M 6.08M >>>>>>> block_cache-sharded_lru_cache root none 937.9K 937.9K >>>>>>> code_cache-sharded_lru_cache root none 1B 1B >>>>>>> server root none 2.3K 201.4K >>>>>>> tablet-00000000000000000000000000000000 server none 530B 200.1K >>>>>>> MemRowSet-6 tablet-00000000000000000000000000000000 none 265B 265B >>>>>>> txn_tracker tablet-00000000000000000000000000000000 64.00M 0B 28.5K >>>>>>> DeltaMemStores tablet-00000000000000000000000000000000 none 265B >>>>>>> 87.8K >>>>>>> log_block_manager server none 1.8K 2.7K >>>>>>> >>>>>>> Thanks, >>>>>>> -- >>>>>>> Abhi Basu >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>>> >>>> >>>> >>>> >>>> -- >>>> Abhi Basu >>>> >>> >>> >> >> >> -- >> Abhi Basu >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Abhi Basu