On Wed, May 18, 2016 at 3:42 PM, Abhi Basu <9000r...@gmail.com> wrote:
> Todd:
>
> Thanks for the update. So Kudu is not designed to be a common storage
> system for long-term and streaming data/random access? Just curious.
>
I'd say it is, but right now we are focusing on more common use cases t
Todd:
Thanks for the update. So Kudu is not designed to be a common storage
system for long-term and streaming data/random access? Just curious.
On Wed, May 18, 2016 at 3:38 PM, Todd Lipcon wrote:
> Hm, so each of the strings is about 27 bytes, so each row is 27KB. So, a
> batch size of 500 is
Hm, so each of the strings is about 27 bytes, so each row is 27KB. So, a
batch size of 500 is still >13MB. I'd start with something very low like
10, and work your way up. That said, this is definitely not in the
"standard" use cases for which Kudu has been designed.
I'd also recommend using comp
Query: describe kudu_db.chr22_kudu
+-++-+
| name| type | comment |
+-++-+
| pos | int| |
| id | string | |
| chrom | string | |
| ref | string | |
| alt | str
What are the types of your 1000 columns? Maybe an even smaller batch size
is necessary.
-Todd
On Wed, May 18, 2016 at 10:41 AM, Abhi Basu <9000r...@gmail.com> wrote:
> I have tried with batch_size=500 and still get same error. For your
> reference are attached info that may help diagnose.
>
> Er
I have tried with batch_size=500 and still get same error. For your
reference are attached info that may help diagnose.
Error: Error while applying Kudu session.: Incomplete: not enough space
remaining in buffer for op (required 46.7K, 7.00M already used
Config settings:
Kudu Tablet Server Bloc
There is some code in review that needs some more refinement.
It will allow upsert/insert from a dataframe using the datasource api. It will
also allow the creation and deletion of tables from a dataframe
http://gerrit.cloudera.org:8080/#/c/2992/
Example usages will look something like:
http://ge
Can someone tell me what the state is of this Spark work?
Also, does anyone have any sample code on how to update/insert data in Kudu
using DataFrames?
Thanks,
Ben
> On Apr 13, 2016, at 8:22 AM, Chris George wrote:
>
> SparkSQL cannot support these type of statements but we may be able to
>
Both options are more or less the same idea- the point is you need less
rows going in per batch so you don't go over the batch size limit. Follow
what Todd said as he explained it more clearly and suggested a better way.
-Will
On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <9000r...@gmail.com> wrote
Thanks for the updates. I will give both options a try and report back.
If you are interested in testing with such datasets, I can help.
Thanks,
Abhi
On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon wrote:
> Hi Abhi,
>
> Will is right that the error is client-side, and probably happening
> becaus
Hi Abhi,
Will is right that the error is client-side, and probably happening because
your rows are so wide.Impala typically will batch 1000 rows at a time when
inserting into Kudu, so if each of your rows is 7-8KB, that will overflow
the max buffer size that Will mentioned. This seems quite probab
11 matches
Mail list logo