Re: Performance Question

2016-07-18 Thread Todd Lipcon
On Mon, Jul 18, 2016 at 10:31 AM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Todd,
>
> Thanks for the info. I was going to upgrade after the testing, but now, it
> looks like I will have to do it earlier than expected.
>
> I will do the upgrade, then resume.
>

OK, sounds good. The upgrade shouldn't invalidate any performance testing
or anything -- just fixes this important bug.

-Todd


> On Jul 18, 2016, at 10:29 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> Hi Ben,
>
> Any chance that you are running Kudu 0.9.0 instead of 0.9.1? There's a
> known serious bug in 0.9.0 which can cause this kind of corruption.
>
> Assuming that you are running with replication count 3 this time, you
> should be able to move aside that tablet metadata file and start the
> server. It will recreate a new repaired replica automatically.
>
> -Todd
>
> On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> During my re-population of the Kudu table, I am getting this error trying
>> to restart a tablet server after it went down. The job that populates this
>> table has been running for over a week.
>>
>> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse
>> message of type "kudu.tablet.TabletSuperBlockPB" because it is missing
>> required fields: rowsets[2324].columns[15].block
>> F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed:
>> _s.ok() Bad status: IO error: Could not init Tablet Manager: Failed to open
>> tablet metadata for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to
>> load tablet metadata for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could
>> not load tablet metadata from
>> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable
>> to parse PB from path:
>> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
>> *** Check failure stack trace: ***
>> @   0x7d794d  google::LogMessage::Fail()
>> @   0x7d984d  google::LogMessage::SendToLog()
>> @   0x7d7489  google::LogMessage::Flush()
>> @   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
>> @   0x78172b  (unknown)
>> @   0x344d41ed5d  (unknown)
>> @   0x7811d1  (unknown)
>>
>> Does anyone know what this means?
>>
>> Thanks,
>> Ben
>>
>>
>> On Jul 11, 2016, at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim <bbuil...@gmail.com>
>> wrote:
>>
>>> Todd,
>>>
>>> I had it at one replica. Do I have to recreate?
>>>
>>
>> We don't currently have the ability to "accept data loss" on a tablet (or
>> set of tablets). If the machine is gone for good, then currently the only
>> easy way to recover is to recreate the table. If this sounds really
>> painful, though, maybe we can work up some kind of tool you could use to
>> just recreate the missing tablets (with those rows lost).
>>
>> -Todd
>>
>>>
>>> On Jul 11, 2016, at 10:37 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>>
>>> Hey Ben,
>>>
>>> Is the table that you're querying replicated? Or was it created with
>>> only one replica per tablet?
>>>
>>> -Todd
>>>
>>> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim <b...@amobee.com> wrote:
>>>
>>>> Over the weekend, a tablet server went down. It’s not coming back up.
>>>> So, I decommissioned it and removed it from the cluster. Then, I restarted
>>>> Kudu because I was getting a timeout  exception trying to do counts on the
>>>> table. Now, when I try again. I get the same error.
>>>>
>>>> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in
>>>> stage 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com):
>>>> com.stumbleupon.async.TimeoutException: Timed out after 3ms when
>>>> joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299,
>>>> callback=passthrough -> scanner opened -> wakeup thread Executor task
>>>> launch worker-2, errback=openScanner errback -> passthrough -> wakeup
>>>> thread Executor task launch worker-2)
>>>> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
>>>> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
>>>> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
>>>> at
>>>> org.kududb.spark.kudu.RowR

Re: Performance Question

2016-07-18 Thread Todd Lipcon
Hi Ben,

Any chance that you are running Kudu 0.9.0 instead of 0.9.1? There's a
known serious bug in 0.9.0 which can cause this kind of corruption.

Assuming that you are running with replication count 3 this time, you
should be able to move aside that tablet metadata file and start the
server. It will recreate a new repaired replica automatically.

-Todd

On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim <bbuil...@gmail.com> wrote:

> During my re-population of the Kudu table, I am getting this error trying
> to restart a tablet server after it went down. The job that populates this
> table has been running for over a week.
>
> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse
> message of type "kudu.tablet.TabletSuperBlockPB" because it is missing
> required fields: rowsets[2324].columns[15].block
> F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed:
> _s.ok() Bad status: IO error: Could not init Tablet Manager: Failed to open
> tablet metadata for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to
> load tablet metadata for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could
> not load tablet metadata from
> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable
> to parse PB from path:
> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
> *** Check failure stack trace: ***
> @   0x7d794d  google::LogMessage::Fail()
> @   0x7d984d  google::LogMessage::SendToLog()
> @   0x7d7489  google::LogMessage::Flush()
> @   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
> @   0x78172b  (unknown)
> @   0x344d41ed5d  (unknown)
> @   0x7811d1  (unknown)
>
> Does anyone know what this means?
>
> Thanks,
> Ben
>
>
> On Jul 11, 2016, at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Todd,
>>
>> I had it at one replica. Do I have to recreate?
>>
>
> We don't currently have the ability to "accept data loss" on a tablet (or
> set of tablets). If the machine is gone for good, then currently the only
> easy way to recover is to recreate the table. If this sounds really
> painful, though, maybe we can work up some kind of tool you could use to
> just recreate the missing tablets (with those rows lost).
>
> -Todd
>
>>
>> On Jul 11, 2016, at 10:37 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> Hey Ben,
>>
>> Is the table that you're querying replicated? Or was it created with only
>> one replica per tablet?
>>
>> -Todd
>>
>> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim <b...@amobee.com> wrote:
>>
>>> Over the weekend, a tablet server went down. It’s not coming back up.
>>> So, I decommissioned it and removed it from the cluster. Then, I restarted
>>> Kudu because I was getting a timeout  exception trying to do counts on the
>>> table. Now, when I try again. I get the same error.
>>>
>>> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in
>>> stage 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com):
>>> com.stumbleupon.async.TimeoutException: Timed out after 3ms when
>>> joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299,
>>> callback=passthrough -> scanner opened -> wakeup thread Executor task
>>> launch worker-2, errback=openScanner errback -> passthrough -> wakeup
>>> thread Executor task launch worker-2)
>>> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
>>> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
>>> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
>>> at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> at
>>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
>>> at
>>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>> at org.apache.spark.

Re: Performance Question

2016-07-11 Thread Todd Lipcon
On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Todd,
>
> I had it at one replica. Do I have to recreate?
>

We don't currently have the ability to "accept data loss" on a tablet (or
set of tablets). If the machine is gone for good, then currently the only
easy way to recover is to recreate the table. If this sounds really
painful, though, maybe we can work up some kind of tool you could use to
just recreate the missing tablets (with those rows lost).

-Todd

>
> On Jul 11, 2016, at 10:37 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> Hey Ben,
>
> Is the table that you're querying replicated? Or was it created with only
> one replica per tablet?
>
> -Todd
>
> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim <b...@amobee.com> wrote:
>
>> Over the weekend, a tablet server went down. It’s not coming back up. So,
>> I decommissioned it and removed it from the cluster. Then, I restarted Kudu
>> because I was getting a timeout  exception trying to do counts on the
>> table. Now, when I try again. I get the same error.
>>
>> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in stage
>> 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com):
>> com.stumbleupon.async.TimeoutException: Timed out after 3ms when
>> joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299,
>> callback=passthrough -> scanner opened -> wakeup thread Executor task
>> launch worker-2, errback=openScanner errback -> passthrough -> wakeup
>> thread Executor task launch worker-2)
>> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
>> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
>> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
>> at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> at
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
>> at
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Does anyone know how to recover from this?
>>
>> Thanks,
>> *Benjamin Kim*
>> *Data Solutions Architect*
>>
>> [a•mo•bee] *(n.)* the company defining digital marketing.
>>
>> *Mobile: +1 818 635 2900 <%2B1%20818%20635%202900>*
>> 3250 Ocean Park Blvd, Suite 200  |  Santa Monica, CA 90405  |
>> www.amobee.com
>>
>> On Jul 6, 2016, at 9:46 AM, Dan Burkert <d...@cloudera.com> wrote:
>>
>>
>>
>> On Wed, Jul 6, 2016 at 7:05 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Over the weekend, the row count is up to <500M. I will give it another
>>> few days to get to 1B rows. I still get consistent times ~15s for doing row
>>> counts despite the amount of data growing.
>>>
>>> On another note, I got a solicitation email from SnappyData to evaluate
>>> their product. They claim to be the “Spark Data Store” with tight
>>> integration with Spark executors. It claims to be an OLTP and OLAP system
>>> with being an in-memory data store first then to disk. After going to
>>> several Spark events, it would seem that this is the new “hot” area for
>>> vendors. They all (MemSQL, Redis, Aerospike, Datastax, etc.) claim to be
>>> the b

Re: Performance Question

2016-07-11 Thread Todd Lipcon
Hey Ben,

Is the table that you're querying replicated? Or was it created with only
one replica per tablet?

-Todd

On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim <b...@amobee.com> wrote:

> Over the weekend, a tablet server went down. It’s not coming back up. So,
> I decommissioned it and removed it from the cluster. Then, I restarted Kudu
> because I was getting a timeout  exception trying to do counts on the
> table. Now, when I try again. I get the same error.
>
> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in stage
> 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com):
> com.stumbleupon.async.TimeoutException: Timed out after 3ms when
> joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299,
> callback=passthrough -> scanner opened -> wakeup thread Executor task
> launch worker-2, errback=openScanner errback -> passthrough -> wakeup
> thread Executor task launch worker-2)
> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
> at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Does anyone know how to recover from this?
>
> Thanks,
> *Benjamin Kim*
> *Data Solutions Architect*
>
> [a•mo•bee] *(n.)* the company defining digital marketing.
>
> *Mobile: +1 818 635 2900 <%2B1%20818%20635%202900>*
> 3250 Ocean Park Blvd, Suite 200  |  Santa Monica, CA 90405  |
> www.amobee.com
>
> On Jul 6, 2016, at 9:46 AM, Dan Burkert <d...@cloudera.com> wrote:
>
>
>
> On Wed, Jul 6, 2016 at 7:05 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Over the weekend, the row count is up to <500M. I will give it another
>> few days to get to 1B rows. I still get consistent times ~15s for doing row
>> counts despite the amount of data growing.
>>
>> On another note, I got a solicitation email from SnappyData to evaluate
>> their product. They claim to be the “Spark Data Store” with tight
>> integration with Spark executors. It claims to be an OLTP and OLAP system
>> with being an in-memory data store first then to disk. After going to
>> several Spark events, it would seem that this is the new “hot” area for
>> vendors. They all (MemSQL, Redis, Aerospike, Datastax, etc.) claim to be
>> the best "Spark Data Store”. I’m wondering if Kudu will become this too?
>> With the performance I’ve seen so far, it would seem that it can be a
>> contender. All that is needed is a hardened Spark connector package, I
>> would think. The next evaluation I will be conducting is to see if
>> SnappyData’s claims are valid by doing my own tests.
>>
>
> It's hard to compare Kudu against any other data store without a lot of
> analysis and thorough benchmarking, but it is certainly a goal of Kudu to
> be a great platform for ingesting and analyzing data through Spark.  Up
> till this point most of the Spark work has been community driven, but more
> thorough integration testing of the Spark connector is going to be a focus
> going forward.
>
> - Dan
>
>
>
>> Cheers,
>> Ben
>>
>>
>>
>> On Jun 15, 2016, at 12:47 AM, Todd Lipcon <t...@cloudera.co

Re: Performance Question

2016-07-01 Thread Todd Lipcon
On Thu, Jun 30, 2016 at 5:39 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Hi Todd,
>
> I changed the key to be what you suggested, and I can’t tell the
> difference since it was already fast. But, I did get more numbers.
>

Yea, you won't see a substantial difference until you're inserting billions
of rows, etc, and the keys and/or bloom filters no longer fit in cache.


>
> > 104M rows in Kudu table
> - read: 8s
> - count: 16s
> - aggregate: 9s
>
> The time to read took much longer from 0.2s to 8s, counts were the same
> 16s, and aggregate queries look longer from 6s to 9s.
>

> I’m still impressed.
>

We aim to please ;-) If you have any interest in writing up these
experiments as a blog post, would be cool to post them for others to learn
from.

-Todd


> On Jun 15, 2016, at 12:47 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> Hi Benjamin,
>
> What workload are you using for benchmarks? Using spark or something more
> custom? rdd or data frame or SQL, etc? Maybe you can share the schema and
> some queries
>
> Todd
>
> Todd
> On Jun 15, 2016 8:10 AM, "Benjamin Kim" <bbuil...@gmail.com> wrote:
>
>> Hi Todd,
>>
>> Now that Kudu 0.9.0 is out. I have done some tests. Already, I am
>> impressed. Compared to HBase, read and write performance are better. Write
>> performance has the greatest improvement (> 4x), while read is > 1.5x.
>> Albeit, these are only preliminary tests. Do you know of a way to really do
>> some conclusive tests? I want to see if I can match your results on my 50
>> node cluster.
>>
>> Thanks,
>> Ben
>>
>> On May 30, 2016, at 10:33 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> On Sat, May 28, 2016 at 7:12 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Todd,
>>>
>>> It sounds like Kudu can possibly top or match those numbers put out by
>>> Aerospike. Do you have any performance statistics published or any
>>> instructions as to measure them myself as good way to test? In addition,
>>> this will be a test using Spark, so should I wait for Kudu version 0.9.0
>>> where support will be built in?
>>>
>>
>> We don't have a lot of benchmarks published yet, especially on the write
>> side. I've found that thorough cross-system benchmarks are very difficult
>> to do fairly and accurately, and often times users end up misguided if they
>> pay too much attention to them :) So, given a finite number of developers
>> working on Kudu, I think we've tended to spend more time on the project
>> itself and less time focusing on "competition". I'm sure there are use
>> cases where Kudu will beat out Aerospike, and probably use cases where
>> Aerospike will beat Kudu as well.
>>
>> From my perspective, it would be great if you can share some details of
>> your workload, especially if there are some areas you're finding Kudu
>> lacking. Maybe we can spot some easy code changes we could make to improve
>> performance, or suggest a tuning variable you could change.
>>
>> -Todd
>>
>>
>>> On May 27, 2016, at 9:19 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>>
>>> On Fri, May 27, 2016 at 8:20 PM, Benjamin Kim <bbuil...@gmail.com>
>>> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> First of all, thanks for the link. It looks like an interesting read. I
>>>> checked that Aerospike is currently at version 3.8.2.3, and in the article,
>>>> they are evaluating version 3.5.4. The main thing that impressed me was
>>>> their claim that they can beat Cassandra and HBase by 8x for writing and
>>>> 25x for reading. Their big claim to fame is that Aerospike can write 1M
>>>> records per second with only 50 nodes. I wanted to see if this is real.
>>>>
>>>
>>> 1M records per second on 50 nodes is pretty doable by Kudu as well,
>>> depending on the size of your records and the insertion order. I've been
>>> playing with a ~70 node cluster recently and seen 1M+ writes/second
>>> sustained, and bursting above 4M. These are 1KB rows with 11 columns, and
>>> with pretty old HDD-only nodes. I think newer flash-based nodes could do
>>> better.
>>>
>>>
>>>>
>>>> To answer your questions, we have a DMP with user profiles with many
>>>> attributes. We create segmentation information off of these attributes to
>>>> classify them. Then, we can target advertising appropriately for our sales
>>>> department. Much of the da

[ANNOUNCE] Apache Kudu (incubating) 0.9.1 released

2016-07-01 Thread Todd Lipcon
The Apache Kudu (incubating) team is happy to announce the release of
Kudu 0.9.1!

Kudu is an open source storage engine for structured data which
supports low-latency random access together with efficient analytical
access patterns. It is designed within the context of the Apache Hadoop
ecosystem and supports many integrations with other data analytics projects
both inside and outside of the Apache Software Foundation.

This is a bug-fix release which fixes several issues in the prior 0.9.0
release. See
http://kudu.incubator.apache.org/releases/0.9.1/docs/release_notes.html for
more information on the resolved issues.

The release can be downloaded from:
http://kudu.incubator.apache.org/releases/0.9.1/

Regards,
The Apache Kudu (incubating) team
===

Apache Kudu (incubating) is an effort undergoing incubation at The
Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other
successful ASF projects. While incubation status is not necessarily a
reflection of the completeness or stability of the code, it does indicate
that the project has yet to be fully endorsed by the ASF.


Re: Performance Question

2016-06-29 Thread Todd Lipcon
On Wed, Jun 29, 2016 at 2:18 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Todd,
>
> FYI. The key  is unique for every row so rows are not going to already
> exist. Basically, everything is an INSERT.
>
> val generateUUID = udf(() => UUID.randomUUID().toString)
>
> As you can see, we are using UUID java library to create the key.
>

OK. You will have better insert performance if instead your key is
something that is increasing with time (eg System.currentTimeMillis() +
UUID).

-Todd


> On Jun 29, 2016, at 1:32 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
> On Wed, Jun 29, 2016 at 11:32 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Todd,
>>
>> I started Spark streaming more events into Kudu. Performance is great
>> there too! With HBase, it’s fast too, but I noticed that it pauses here and
>> there, making it take seconds for > 40k rows at a time, while Kudu doesn’t.
>> The progress bar just blinks by. I will keep this running until it hits 1B
>> rows and rerun my performance tests. This, hopefully, will give better
>> numbers.
>>
>
> Cool! We have invested a lot of work in making Kudu have consistent
> performance, like you mentioned. It's generally been my experience that
> most mature ops people would prefer a system which consistently performs
> well rather than one which has higher peak performance but occasionally
> stalls.
>
> BTW, what is your row key design? One exception to the above is that, if
> you're doing random inserts, you may see performance "fall off a cliff"
> once the size of your key columns becomes larger than the aggregate memory
> size of your cluster, if you're running on hard disks. Our inserts require
> checks for duplicate keys, and that can cause random disk IOs if your keys
> don't fit comfortably in cache. This is one area that HBase is
> fundamentally going to be faster based on its design.
>
> -Todd
>
>
>> On Jun 28, 2016, at 4:26 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> Cool, thanks for the report, Ben. For what it's worth, I think there's
>> still some low hanging fruit in the Spark connector for Kudu (for example,
>> I believe locality on reads is currently broken). So, you can expect
>> performance to continue to improve in future versions. I'd also be
>> interested to see results on Kudu for a much larger dataset - my guess is a
>> lot of the 6 seconds you're seeing is constant overhead from Spark job
>> setup, etc, given that the performance doesn't seem to get slower as you
>> went from 700K rows to 13M rows.
>>
>> -Todd
>>
>> On Tue, Jun 28, 2016 at 3:03 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> FYI.
>>>
>>> I did a quick-n-dirty performance test.
>>>
>>> First, the setup:
>>> QA cluster:
>>>
>>>- 15 data nodes
>>>   - 64GB memory each
>>>   - HBase is using 4GB of memory
>>>   - Kudu is using 1GB of memory
>>>- 1 HBase/Kudu master node
>>>   - 64GB memory
>>>   - HBase/Kudu master is using 1GB of memory each
>>>- 10Gb Ethernet
>>>
>>>
>>> Using Spark on both to load/read events data (84 columns per row), I was
>>> able to record performance for each. On the HBase side, I used the Phoenix
>>> 4.7 Spark plugin where DataFrames can be used directly. On the Kudu side, I
>>> used the Spark connector. I created an events table in Phoenix using the
>>> CREATE TABLE statement and created the equivalent in Kudu using the Spark
>>> method based off of a DataFrame schema.
>>>
>>> Here are the numbers for Phoenix/HBase.
>>> 1st run:
>>> > 715k rows
>>> - write: 2.7m
>>>
>>> > 715k rows in HBase table
>>> - read: 0.1s
>>> - count: 3.8s
>>> - aggregate: 61s
>>>
>>> 2nd run:
>>> > 5.2M rows
>>> - write: 11m
>>> * had 4 region servers go down, had to retry the 5.2M row write
>>>
>>> > 5.9M rows in HBase table
>>> - read: 8s
>>> - count: 3m
>>> - aggregate: 46s
>>>
>>> 3rd run:
>>> > 6.8M rows
>>> - write: 9.6m
>>>
>>> > 12.7M rows
>>> - read: 10s
>>> - count: 3m
>>> - aggregate: 44s
>>>
>>>
>>> Here are the numbers for Kudu.
>>> 1st run:
>>> > 715k rows
>>> - write: 18s
>>>
>>> > 715k rows in Kudu table
>>> - read: 0.2s
>>> - count: 18s
>>> - aggregate: 5s
>>

Re: Performance Question

2016-06-29 Thread Todd Lipcon
On Wed, Jun 29, 2016 at 11:32 AM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Todd,
>
> I started Spark streaming more events into Kudu. Performance is great
> there too! With HBase, it’s fast too, but I noticed that it pauses here and
> there, making it take seconds for > 40k rows at a time, while Kudu doesn’t.
> The progress bar just blinks by. I will keep this running until it hits 1B
> rows and rerun my performance tests. This, hopefully, will give better
> numbers.
>

Cool! We have invested a lot of work in making Kudu have consistent
performance, like you mentioned. It's generally been my experience that
most mature ops people would prefer a system which consistently performs
well rather than one which has higher peak performance but occasionally
stalls.

BTW, what is your row key design? One exception to the above is that, if
you're doing random inserts, you may see performance "fall off a cliff"
once the size of your key columns becomes larger than the aggregate memory
size of your cluster, if you're running on hard disks. Our inserts require
checks for duplicate keys, and that can cause random disk IOs if your keys
don't fit comfortably in cache. This is one area that HBase is
fundamentally going to be faster based on its design.

-Todd


> On Jun 28, 2016, at 4:26 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
> Cool, thanks for the report, Ben. For what it's worth, I think there's
> still some low hanging fruit in the Spark connector for Kudu (for example,
> I believe locality on reads is currently broken). So, you can expect
> performance to continue to improve in future versions. I'd also be
> interested to see results on Kudu for a much larger dataset - my guess is a
> lot of the 6 seconds you're seeing is constant overhead from Spark job
> setup, etc, given that the performance doesn't seem to get slower as you
> went from 700K rows to 13M rows.
>
> -Todd
>
> On Tue, Jun 28, 2016 at 3:03 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> FYI.
>>
>> I did a quick-n-dirty performance test.
>>
>> First, the setup:
>> QA cluster:
>>
>>- 15 data nodes
>>   - 64GB memory each
>>   - HBase is using 4GB of memory
>>   - Kudu is using 1GB of memory
>>- 1 HBase/Kudu master node
>>   - 64GB memory
>>   - HBase/Kudu master is using 1GB of memory each
>>- 10Gb Ethernet
>>
>>
>> Using Spark on both to load/read events data (84 columns per row), I was
>> able to record performance for each. On the HBase side, I used the Phoenix
>> 4.7 Spark plugin where DataFrames can be used directly. On the Kudu side, I
>> used the Spark connector. I created an events table in Phoenix using the
>> CREATE TABLE statement and created the equivalent in Kudu using the Spark
>> method based off of a DataFrame schema.
>>
>> Here are the numbers for Phoenix/HBase.
>> 1st run:
>> > 715k rows
>> - write: 2.7m
>>
>> > 715k rows in HBase table
>> - read: 0.1s
>> - count: 3.8s
>> - aggregate: 61s
>>
>> 2nd run:
>> > 5.2M rows
>> - write: 11m
>> * had 4 region servers go down, had to retry the 5.2M row write
>>
>> > 5.9M rows in HBase table
>> - read: 8s
>> - count: 3m
>> - aggregate: 46s
>>
>> 3rd run:
>> > 6.8M rows
>> - write: 9.6m
>>
>> > 12.7M rows
>> - read: 10s
>> - count: 3m
>> - aggregate: 44s
>>
>>
>> Here are the numbers for Kudu.
>> 1st run:
>> > 715k rows
>> - write: 18s
>>
>> > 715k rows in Kudu table
>> - read: 0.2s
>> - count: 18s
>> - aggregate: 5s
>>
>> 2nd run:
>> > 5.2M rows
>> - write: 33s
>>
>> > 5.9M rows in Kudu table
>> - read: 0.2s
>> - count: 16s
>> - aggregate: 6s
>>
>> 3rd run:
>> > 6.8M rows
>> - write: 27s
>>
>> > 12.7M rows in Kudu table
>> - read: 0.2s
>> - count: 16s
>> - aggregate: 6s
>>
>> The Kudu results are impressive if you take these number as-is. Kudu is
>> close to 18x faster at writing (UPSERT). Kudu is 30x faster at reading
>> (HBase times increase as data size grows).  Kudu is 7x faster at full row
>> counts. Lastly, Kudu is 3x faster doing an aggregate query (count distinct
>> event_id’s per user_id). *Remember that this is small cluster, times are
>> still respectable for both systems, HBase could have been configured
>> better, and the HBase table could have been better tuned.
>>
>> Cheers,
>> Ben
>>
>>
>> On Jun 15, 2016, at 10:13 AM, Dan Burkert <d...@clou

Re: Performance Question

2016-06-28 Thread Todd Lipcon
Cool, thanks for the report, Ben. For what it's worth, I think there's
still some low hanging fruit in the Spark connector for Kudu (for example,
I believe locality on reads is currently broken). So, you can expect
performance to continue to improve in future versions. I'd also be
interested to see results on Kudu for a much larger dataset - my guess is a
lot of the 6 seconds you're seeing is constant overhead from Spark job
setup, etc, given that the performance doesn't seem to get slower as you
went from 700K rows to 13M rows.

-Todd

On Tue, Jun 28, 2016 at 3:03 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> FYI.
>
> I did a quick-n-dirty performance test.
>
> First, the setup:
> QA cluster:
>
>- 15 data nodes
>   - 64GB memory each
>   - HBase is using 4GB of memory
>   - Kudu is using 1GB of memory
>- 1 HBase/Kudu master node
>   - 64GB memory
>   - HBase/Kudu master is using 1GB of memory each
>- 10Gb Ethernet
>
>
> Using Spark on both to load/read events data (84 columns per row), I was
> able to record performance for each. On the HBase side, I used the Phoenix
> 4.7 Spark plugin where DataFrames can be used directly. On the Kudu side, I
> used the Spark connector. I created an events table in Phoenix using the
> CREATE TABLE statement and created the equivalent in Kudu using the Spark
> method based off of a DataFrame schema.
>
> Here are the numbers for Phoenix/HBase.
> 1st run:
> > 715k rows
> - write: 2.7m
>
> > 715k rows in HBase table
> - read: 0.1s
> - count: 3.8s
> - aggregate: 61s
>
> 2nd run:
> > 5.2M rows
> - write: 11m
> * had 4 region servers go down, had to retry the 5.2M row write
>
> > 5.9M rows in HBase table
> - read: 8s
> - count: 3m
> - aggregate: 46s
>
> 3rd run:
> > 6.8M rows
> - write: 9.6m
>
> > 12.7M rows
> - read: 10s
> - count: 3m
> - aggregate: 44s
>
>
> Here are the numbers for Kudu.
> 1st run:
> > 715k rows
> - write: 18s
>
> > 715k rows in Kudu table
> - read: 0.2s
> - count: 18s
> - aggregate: 5s
>
> 2nd run:
> > 5.2M rows
> - write: 33s
>
> > 5.9M rows in Kudu table
> - read: 0.2s
> - count: 16s
> - aggregate: 6s
>
> 3rd run:
> > 6.8M rows
> - write: 27s
>
> > 12.7M rows in Kudu table
> - read: 0.2s
> - count: 16s
> - aggregate: 6s
>
> The Kudu results are impressive if you take these number as-is. Kudu is
> close to 18x faster at writing (UPSERT). Kudu is 30x faster at reading
> (HBase times increase as data size grows).  Kudu is 7x faster at full row
> counts. Lastly, Kudu is 3x faster doing an aggregate query (count distinct
> event_id’s per user_id). *Remember that this is small cluster, times are
> still respectable for both systems, HBase could have been configured
> better, and the HBase table could have been better tuned.
>
> Cheers,
> Ben
>
>
> On Jun 15, 2016, at 10:13 AM, Dan Burkert <d...@cloudera.com> wrote:
>
> Adding partition splits when range partitioning is done via the
> CreateTableOptions.addSplitRow
> <http://getkudu.io/apidocs/org/kududb/client/CreateTableOptions.html#addSplitRow-org.kududb.client.PartialRow->
>  method.
> You can find more about the different partitioning options in the schema
> design guide <http://getkudu.io/docs/schema_design.html#data-distribution>.
> We generally recommend sticking to hash partitioning if possible, since you
> don't have to determine your own split rows.
>
> - Dan
>
> On Wed, Jun 15, 2016 at 9:17 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Todd,
>>
>> I think the locality is not within our setup. We have the compute cluster
>> with Spark, YARN, etc. on its own, and we have the storage cluster with
>> HBase, Kudu, etc. on another. We beefed up the hardware specs on the
>> compute cluster and beefed up storage capacity on the storage cluster. We
>> got this setup idea from the Databricks folks. I do have a question. I
>> created the table to use range partition on columns. I see that if I use
>> hash partition I can set the number of splits, but how do I do that using
>> range (50 nodes * 10 = 500 splits)?
>>
>> Thanks,
>> Ben
>>
>>
>> On Jun 15, 2016, at 9:11 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> Awesome use case. One thing to keep in mind is that spark parallelism
>> will be limited by the number of tablets. So, you might want to split into
>> 10 or so buckets per node to get the best query throughput.
>>
>> Usually if you run top on some machines while running the query you can
>> see if it is fully utilizing the cores

Re: Performance Question

2016-06-15 Thread Todd Lipcon
Awesome use case. One thing to keep in mind is that spark parallelism will
be limited by the number of tablets. So, you might want to split into 10 or
so buckets per node to get the best query throughput.

Usually if you run top on some machines while running the query you can see
if it is fully utilizing the cores.

Another known issue right now is that spark locality isn't working properly
on replicated tables so you will use a lot of network traffic. For a perf
test you might want to try a table with replication count 1
On Jun 15, 2016 5:26 PM, "Benjamin Kim" <bbuil...@gmail.com> wrote:

Hi Todd,

I did a simple test of our ad events. We stream using Spark Streaming
directly into HBase, and the Data Analysts/Scientists do some
insight/discovery work plus some reports generation. For the reports, we
use SQL, and the more deeper stuff, we use Spark. In Spark, our main data
currency store of choice is DataFrames.

The schema is around 83 columns wide where most are of the string data type.

"event_type", "timestamp", "event_valid", "event_subtype", "user_ip",
"user_id", "mappable_id",
"cookie_status", "profile_status", "user_status", "previous_timestamp",
"user_agent", "referer",
"host_domain", "uri", "request_elapsed", "browser_languages", "acamp_id",
"creative_id",
"location_id", “pcamp_id",
"pdomain_id", "continent_code", "country", "region", "dma", "city", "zip",
"isp", "line_speed",
"gender", "year_of_birth", "behaviors_read", "behaviors_written",
"key_value_pairs", "acamp_candidates",
"tag_format", "optimizer_name", "optimizer_version", "optimizer_ip",
"pixel_id", “video_id",
"video_network_id", "video_time_watched", "video_percentage_watched",
"video_media_type",
"video_player_iframed", "video_player_in_view", "video_player_width",
"video_player_height",
"conversion_valid_sale", "conversion_sale_amount",
"conversion_commission_amount", "conversion_step",
"conversion_currency", "conversion_attribution", "conversion_offer_id",
"custom_info", "frequency",
"recency_seconds", "cost", "revenue", “optimizer_acamp_id",
"optimizer_creative_id", "optimizer_ecpm", "impression_id",
"diagnostic_data",
"user_profile_mapping_source", "latitude", "longitude", "area_code",
"gmt_offset", "in_dst",
"proxy_type", "mobile_carrier", "pop", "hostname", "profile_expires",
"timestamp_iso", "reference_id",
"identity_organization", "identity_method"

Most queries are like counts of how many users use what browser, how many
are unique users, etc. The part that scares most users is when it comes to
joining this data with other dimension/3rd party events tables because of
shear size of it.

We do what most companies do, similar to what I saw in earlier
presentations of Kudu. We dump data out of HBase into partitioned Parquet
tables to make query performance manageable.

I will coordinate with a data scientist today to do some tests. He is
working on identity matching/record linking of users from 2 domains: US and
Singapore, using probabilistic deduping algorithms. I will load the data
from ad events from both countries, and let him run his process against
this data in Kudu. I hope this will “wow” the team.

Thanks,
Ben

On Jun 15, 2016, at 12:47 AM, Todd Lipcon <t...@cloudera.com> wrote:

Hi Benjamin,

What workload are you using for benchmarks? Using spark or something more
custom? rdd or data frame or SQL, etc? Maybe you can share the schema and
some queries

Todd

Todd
On Jun 15, 2016 8:10 AM, "Benjamin Kim" <bbuil...@gmail.com> wrote:

> Hi Todd,
>
> Now that Kudu 0.9.0 is out. I have done some tests. Already, I am
> impressed. Compared to HBase, read and write performance are better. Write
> performance has the greatest improvement (> 4x), while read is > 1.5x.
> Albeit, these are only preliminary tests. Do you know of a way to really do
> some conclusive tests? I want to see if I can match your results on my 50
> node cluster.
>
> Thanks,
> Ben
>
> On May 30, 2016, at 10:33 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> On Sat, May 28, 2016 at 7:12 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Todd,
>>
>> 

Re: Performance Question

2016-06-15 Thread Todd Lipcon
Hi Benjamin,

What workload are you using for benchmarks? Using spark or something more
custom? rdd or data frame or SQL, etc? Maybe you can share the schema and
some queries

Todd

Todd
On Jun 15, 2016 8:10 AM, "Benjamin Kim" <bbuil...@gmail.com> wrote:

> Hi Todd,
>
> Now that Kudu 0.9.0 is out. I have done some tests. Already, I am
> impressed. Compared to HBase, read and write performance are better. Write
> performance has the greatest improvement (> 4x), while read is > 1.5x.
> Albeit, these are only preliminary tests. Do you know of a way to really do
> some conclusive tests? I want to see if I can match your results on my 50
> node cluster.
>
> Thanks,
> Ben
>
> On May 30, 2016, at 10:33 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
> On Sat, May 28, 2016 at 7:12 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Todd,
>>
>> It sounds like Kudu can possibly top or match those numbers put out by
>> Aerospike. Do you have any performance statistics published or any
>> instructions as to measure them myself as good way to test? In addition,
>> this will be a test using Spark, so should I wait for Kudu version 0.9.0
>> where support will be built in?
>>
>
> We don't have a lot of benchmarks published yet, especially on the write
> side. I've found that thorough cross-system benchmarks are very difficult
> to do fairly and accurately, and often times users end up misguided if they
> pay too much attention to them :) So, given a finite number of developers
> working on Kudu, I think we've tended to spend more time on the project
> itself and less time focusing on "competition". I'm sure there are use
> cases where Kudu will beat out Aerospike, and probably use cases where
> Aerospike will beat Kudu as well.
>
> From my perspective, it would be great if you can share some details of
> your workload, especially if there are some areas you're finding Kudu
> lacking. Maybe we can spot some easy code changes we could make to improve
> performance, or suggest a tuning variable you could change.
>
> -Todd
>
>
>> On May 27, 2016, at 9:19 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>> On Fri, May 27, 2016 at 8:20 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Hi Mike,
>>>
>>> First of all, thanks for the link. It looks like an interesting read. I
>>> checked that Aerospike is currently at version 3.8.2.3, and in the article,
>>> they are evaluating version 3.5.4. The main thing that impressed me was
>>> their claim that they can beat Cassandra and HBase by 8x for writing and
>>> 25x for reading. Their big claim to fame is that Aerospike can write 1M
>>> records per second with only 50 nodes. I wanted to see if this is real.
>>>
>>
>> 1M records per second on 50 nodes is pretty doable by Kudu as well,
>> depending on the size of your records and the insertion order. I've been
>> playing with a ~70 node cluster recently and seen 1M+ writes/second
>> sustained, and bursting above 4M. These are 1KB rows with 11 columns, and
>> with pretty old HDD-only nodes. I think newer flash-based nodes could do
>> better.
>>
>>
>>>
>>> To answer your questions, we have a DMP with user profiles with many
>>> attributes. We create segmentation information off of these attributes to
>>> classify them. Then, we can target advertising appropriately for our sales
>>> department. Much of the data processing is for applying models on all or if
>>> not most of every profile’s attributes to find similarities (nearest
>>> neighbor/clustering) over a large number of rows when batch processing or a
>>> small subset of rows for quick online scoring. So, our use case is a
>>> typical advanced analytics scenario. We have tried HBase, but it doesn’t
>>> work well for these types of analytics.
>>>
>>> I read, that Aerospike in the release notes, they did do many
>>> improvements for batch and scan operations.
>>>
>>> I wonder what your thoughts are for using Kudu for this.
>>>
>>
>> Sounds like a good Kudu use case to me. I've heard great things about
>> Aerospike for the low latency random access portion, but I've also heard
>> that it's _very_ expensive, and not particularly suited to the columnar
>> scan workload. Lastly, I think the Apache license of Kudu is much more
>> appealing than the AGPL3 used by Aerospike. But, that's not really a direct
>> answer to the performance question :)
>>
>>
>>>
>>> Thanks,
>>> Ben
>>>
>>>
>>> On

Re: Imroving the insert peformance with INSERT INTO SELECT - gFlagfile

2016-05-30 Thread Todd Lipcon
Hi Amit

Answers below

On Sun, May 29, 2016 at 11:37 AM, Amit Adhau  wrote:

> Hi,
>
> What is the significance of using below gflags, which can help in imroving
> the insert peformance with INSERT INTO SELECT clause.
>
> --num_tablets_to_open_simultaneously=8
>

This only affects the startup time of a tablet server, and should not
affect the insert performance at all.


> --scanner_batch_size_rows=1000
>

This only affects the read performance. I've seen it have a noticeable
affect at times, but can also cause some memory management issues with
wider tables - that's why the default is 100. It won't affect write
performance at all.


>
> and maintenance_manager_num_threads(Kudu Tablet Server Maintenance
> Threads) in cloudera manager.
>
>
This could improve write performance, since it increases the number of
threads available to perform compaction and flushes. Assuming your hardware
looks like typical Hadoop nodes (eg 10-12 disks, 8-16 cores), I would try
setting it to 4 as a starting point.


> As playing with these configs, gives most of the time errors like "Timed
> out: Failed to write batch of ops to tablet " OR "Illegal state: Tablet not
> RUNNING: NOT_STARTED"
>
>
Those errors seem to indicate you are probably trying to perform
reads/writes while the servers are still in the process of starting up.
Maybe you are not giving the cluster enough time to fully restart before
you are restarting the workload after changing the tuning?

-Todd


Re: Performance Question

2016-05-27 Thread Todd Lipcon
On Fri, May 27, 2016 at 8:20 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Hi Mike,
>
> First of all, thanks for the link. It looks like an interesting read. I
> checked that Aerospike is currently at version 3.8.2.3, and in the article,
> they are evaluating version 3.5.4. The main thing that impressed me was
> their claim that they can beat Cassandra and HBase by 8x for writing and
> 25x for reading. Their big claim to fame is that Aerospike can write 1M
> records per second with only 50 nodes. I wanted to see if this is real.
>

1M records per second on 50 nodes is pretty doable by Kudu as well,
depending on the size of your records and the insertion order. I've been
playing with a ~70 node cluster recently and seen 1M+ writes/second
sustained, and bursting above 4M. These are 1KB rows with 11 columns, and
with pretty old HDD-only nodes. I think newer flash-based nodes could do
better.


>
> To answer your questions, we have a DMP with user profiles with many
> attributes. We create segmentation information off of these attributes to
> classify them. Then, we can target advertising appropriately for our sales
> department. Much of the data processing is for applying models on all or if
> not most of every profile’s attributes to find similarities (nearest
> neighbor/clustering) over a large number of rows when batch processing or a
> small subset of rows for quick online scoring. So, our use case is a
> typical advanced analytics scenario. We have tried HBase, but it doesn’t
> work well for these types of analytics.
>
> I read, that Aerospike in the release notes, they did do many improvements
> for batch and scan operations.
>
> I wonder what your thoughts are for using Kudu for this.
>

Sounds like a good Kudu use case to me. I've heard great things about
Aerospike for the low latency random access portion, but I've also heard
that it's _very_ expensive, and not particularly suited to the columnar
scan workload. Lastly, I think the Apache license of Kudu is much more
appealing than the AGPL3 used by Aerospike. But, that's not really a direct
answer to the performance question :)


>
> Thanks,
> Ben
>
>
> On May 27, 2016, at 6:21 PM, Mike Percy <mpe...@cloudera.com> wrote:
>
> Have you considered whether you have a scan heavy or a random access heavy
> workload? Have you considered whether you always access / update a whole
> row vs only a partial row? Kudu is a column store so has some
> awesome performance characteristics when you are doing a lot of scanning of
> just a couple of columns.
>
> I don't know the answer to your question but if your concern is
> performance then I would be interested in seeing comparisons from a perf
> perspective on certain workloads.
>
> Finally, a year ago Aerospike did quite poorly in a Jepsen test:
> https://aphyr.com/posts/324-jepsen-aerospike
>
> I wonder if they have addressed any of those issues.
>
> Mike
>
> On Friday, May 27, 2016, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> I am just curious. How will Kudu compare with Aerospike (
>> http://www.aerospike.com)? I went to a Spark Roadshow and found out
>> about this piece of software. It appears to fit our use case perfectly
>> since we are an ad-tech company trying to leverage our user profiles data.
>> Plus, it already has a Spark connector and has a SQL-like client. The
>> tables can be accessed using Spark SQL DataFrames and, also, made into SQL
>> tables for direct use with Spark SQL ODBC/JDBC Thriftserver. I see from the
>> work done here http://gerrit.cloudera.org:8080/#/c/2992/ that the Spark
>> integration is well underway and, from the looks of it lately, almost
>> complete. I would prefer to use Kudu since we are already a Cloudera shop,
>> and Kudu is easy to deploy and configure using Cloudera Manager. I also
>> hope that some of Aerospike’s speed optimization techniques can make it
>> into Kudu in the future, if they have not been already thought of or
>> included.
>>
>> Just some thoughts…
>>
>> Cheers,
>> Ben
>
>
>
> --
> --
> Mike Percy
> Software Engineer, Cloudera
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Kudu Data Storage Size Mistmatch On dashboard / data folder

2016-04-28 Thread Todd Lipcon
Hi Amit,

What you're probably seeing is container pre-allocation. We use 'fallocate'
to preallocate space in the block container files, to avoid fragmentation
on disk. So, the disk usage will increase in multiples of 32MB within each
data container file. You can see the number of active containers by looking
at the 'log_block_manager_containers' metric and subtract
'log_block_manager_full_containers'

-Todd

On Thu, Apr 28, 2016 at 6:03 AM, Amit Adhau <amit.ad...@globant.com> wrote:

> Hi Kudu team,
>
> I have below observations for kudu data storage size mismatch;
>
> I was having similar observation last week when total on-disk size was
> between 4-5GB and data folder was showing 180GB. Hence, we have cleaned all
> kudu data and created new master and tablet data directories. That means
> kudu was having zero table. After that, we created a new table in kudu and
> inserted just single record as 1461842027,'Test Event'.
>
> 1] Now, On Kudu Dashboard - TOTAL On-Disk Size in Kudu is 250B, for a
> single table having single record[1461842027,'Test Event']
>
> 2]
>
> Result for bytes_under is as per below;
>
> For Link:- http://kuduserver:8051/metrics?metrics=bytes_under
>
> "type": "server",
> "id": "kudu.master",
> "attributes": {},
> "metrics": [
> {
> "name": "log_block_manager_bytes_under_management",
> "value": 48549
> }
> ]
>
> For Link:- http://kuduserver:8050/metrics?metrics=bytes_under
>
> "type": "server",
> "id": "kudu.tabletserver",
> "attributes": {},
> "metrics": [
> {
> "name": "log_block_manager_bytes_under_management",
> "value": 4657
> }
> ]
>
> 3] And data folder size on both tablet server is 161MB.
>
> as confirmed earlier, Total on-disk size[1] should be equal to Data folder
> size[3] and probably [2] should be also in sync.
>
> Can you please suggest, if this is normal or if it is an issue. This would
> be important factor while planning for capacity.
>
> FYI, On our Kudu cluster, we have 2 tablet server and a master[running as
> tablet as well]
>
>
> --
> Thanks & Regards,
>
> *Amit Adhau* | Data Architect
>
> *GLOBANT* | IND:+91 9821518132
>
> [image: Facebook] <https://www.facebook.com/Globant>
>
> [image: Twitter] <http://www.twitter.com/globant>
>
> [image: Youtube] <http://www.youtube.com/Globant>
>
> [image: Linkedin] <http://www.linkedin.com/company/globant>
>
> [image: Pinterest] <http://pinterest.com/globant/>
>
> [image: Globant] <http://www.globant.com/>
>
> The information contained in this e-mail may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution or
> copying of this communication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Exception at inserting big amount of data

2016-04-27 Thread Todd Lipcon
Hi Juan,

I see evidence of one issue in your log:

The 'master' server has errors about missing blocks across many of the
tablets. Is it possible that one of the drives hosting Kudu data got
unmounted or accidentally removed? Or perhaps the set of data directories
was changed after Kudu had been in use for a while?

I think this might be unrelated to the issue you're seeing, though --
according to the metrics, it's tablet '027fbba' which you're trying to
write to, but that doesn't seem to have any replicas on the node 'master'.

In terms of the tablet that is seeing writes, the odd thing is that the log
and metrics indicate that the writes are proceeding quite fast:
"name":
"handler_latency_kudu_tserver_TabletServerService_Write",
"total_count": 4891,
"min": 28,
"mean": 222.407,
"percentile_75": 82,
"percentile_95": 119,
"percentile_99": 1936,
"percentile_99_9": 20608,
"percentile_99_99": 25728,
"max": 25855,
"total_sum": 1087794
},

So there was no write operation which took longer than 26ms.

The other red flag in the log is the following error:
W0426 11:57:11.404397 11836 connection.cc:140] Shutting down connection
server connection from 10.0.6.6:39313 with pending inbound data (4/8575988
bytes received, last active 0 ns ago, status=Network error: the frame had a
length of 8575988, but we only support messages up to 8388608 bytes long.)

which is plausibly because the batches coming from the client are too
large. It's possible that the Java client doesn't check its batch size
before sending RPCs, and this is causing the server side to disconnect the
client.

Can you try either (a) run the tservers with the flag
--rpc_max_message_size=16777216 or (b) change the size of your manual
batches to be a bit smaller?

This is obviously an area that we need to make more diagnosable, so thanks
for reporting the issue.

-Todd





On Tue, Apr 26, 2016 at 11:26 AM, Juan Pablo Briganti <
juan.briga...@globant.com> wrote:

> Hi Jean-Daniel
>
> Thanks for your response.
> As you said, master node has both roles: master and tablet server.
> I attach the log and metrics for both servers. Do not pay attention to
> server's time even if they don't match, I extracted both from a completely
> new run.
> If there is any problem with log format or uploaded files please let me
> know and I'll try to generate again.
> Let me add that, if I try to insert small amount of data (10-20
> registers), It works ok.
>
> Thanks again.
>
>
> Hi Juan Pablo,
>
> The error basically means that the client didn't hear from the server
> after sending the data, even after retrying a few times, and reached the
> default 10 seconds timeout. Can you run your insert again and then capture
> the output of this command?
>
> curl -s http://10.0.6.157:8050/metrics | gzip - > metrics.gz
>
> Then post that file somewhere we can download. I you have more than one
> tablet server, it might be a different node, basically I want the one that
> ends up listed in this exception on the right:
>
> Caused by: org.kududb.client.ConnectionResetException: [Peer
> f7e2936b040d4c58b52d90ae50ad6d5a] Connection reset on [id: 0x323019c2, /
> 10.0.6.6:58930 :> /10.0.6.157:7050]
>
> Also, can we see the logs from that node around 10AM on 16/04/26?
>
> Finally, I'm surprised you're even able to create your table if you only
> have one tablet server and a replication of 2 (unless you meant to say that
> your master node has both a master and a tablet server).
>
> J-D
> --
>
> The information contained in this e-mail may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution or
> copying of this communication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Blog post on some YCSB optimization

2016-04-27 Thread Todd Lipcon
Hi all,

In case you don't use RSS to follow the blog, I figured I'd ping the
mailing list. I just published a post about some exploration I've been
doing using YCSB lately. Users might find it interesting:

http://getkudu.io/2016/04/26/ycsb.html

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Weekly update 4/25

2016-04-26 Thread Todd Lipcon
On Tue, Apr 26, 2016 at 10:14 AM, Jordan Birdsell <
jordan.birdsell.k...@statefarm.com> wrote:

> If we had to go less frequently than a day I’m sure it’d be acceptable.
> The volume of deletes is very low in this case.  In some tables we can just
> “erase” a column’s data but in others, based on the data design, we must
> delete the entire row or group of rows.
>

Thanks for the details.

I'm curious, are you solving this use case with an existing system today?
(eg HBase, HDFS, or some RDBMS?) Would like to compare our planned
implementation with whatever that system is doing to make sure it's at
least as good.

-Todd


>
>
> *From:* Todd Lipcon [mailto:t...@cloudera.com]
> *Sent:* Tuesday, April 26, 2016 12:59 PM
>
> *To:* user@kudu.incubator.apache.org
> *Subject:* Re: Weekly update 4/25
>
>
>
> On Tue, Apr 26, 2016 at 8:28 AM, Jordan Birdsell <
> jordan.birdsell.k...@statefarm.com> wrote:
>
> Yes, this is exactly what we need to do.  Not immediately is ok for our
> current requirements, I’d say within a day would be ideal.
>
>
>
> Even within a day can be tricky for this kind of system if you have a
> fairly uniform random delete workload. That would imply that you're
> rewriting _all_ of your data every day, which uses a fair amount of IO.
>
>
>
> Are deletes extremely rare for your use case?
>
>
>
> Is it the entire row of data that has to be deleted or would it be
> sufficient to "X out" some particularly sensitive column?
>
>
>
> -Todd
>
>
>
>
>
> *From:* Jean-Daniel Cryans [mailto:jdcry...@apache.org]
> *Sent:* Tuesday, April 26, 2016 11:15 AM
>
>
> *To:* user@kudu.incubator.apache.org
> *Subject:* Re: Weekly update 4/25
>
>
>
> Oh I see so this is in order to comply with asks such as "much sure that
> data for some user/customer is 100% deleted"? We'll still have the problem
> where we don't want to rewrite all the base data files (GBs/TBs) to clean
> up KBs of data, although since a single row is always only part of one row
> set, it means it's at most 64MB that you'd be rewriting.
>
>
>
> BTW is it ok if the data isn't immediately deleted? How long is it
> acceptable to wait for before it happens?
>
>
>
> J-D
>
>
>
> On Tue, Apr 26, 2016 at 8:04 AM, Jordan Birdsell <
> jordan.birdsell.k...@statefarm.com> wrote:
>
> Correct.  As for the “latest version”, if a row is deleted in the latest
> version then removing the old versions where it existed is exactly what
> we’re looking to do.  Basically, we need a way to physically get rid of
> select rows (or data within a column for that matter) and all versions of
> that row or column data.
>
>
>
> *From:* Jean-Daniel Cryans [mailto:jdcry...@apache.org]
> *Sent:* Tuesday, April 26, 2016 10:56 AM
> *To:* user@kudu.incubator.apache.org
> *Subject:* Re: Weekly update 4/25
>
>
>
> Hi Jordan,
>
>
>
> In other words, you'd like to tag specific rows to be excluded from the
> default data history retention?
>
>
>
> Also, keep in mind that this improvement is about removing old versions of
> the data, it will not delete the latest version. If you are used to HBase,
> it's like specifying some TTL plus MIN_VERSIONS=1 so it doesn't completely
> age out a row.
>
>
>
> Hope this helps,
>
>
>
> J-D
>
>
>
> On Tue, Apr 26, 2016 at 4:29 AM, Jordan Birdsell <
> jordan.birdsell.k...@statefarm.com> wrote:
>
> Hi,
>
>
>
> Regarding row GC,  I see in the design document that the tablet history
> max age will be set at the table level, would it be possible to make this
> something that can be overridden for specific transactions?  We have some
> use cases that would require accelerated removal of data from disk and
> other use cases that would not have the same requirement. Unfortunately,
> these different use cases apply, often times, to the same tables.
>
>
>
> Thanks,
>
> Jordan Birdsell
>
>
>
> *From:* Todd Lipcon [mailto:t...@apache.org]
> *Sent:* Monday, April 25, 2016 1:54 PM
> *To:* d...@kudu.incubator.apache.org; user@kudu.incubator.apache.org
> *Subject:* Weekly update 4/25
>
>
>
> Hey Kudu-ers,
>
>
>
> For the last month and a half, I've been posting weekly summaries of
> community development activity on the Kudu blog. In case you aren't on
> twitter or slack you might not have seen the posts, so I'm going to start
> emailing them to the list as well.
>
>
>
> Here's this week's update:
>
> http://getkudu.io/2016/04/25/weekly-update.html
>
>
>
> Feel free to reply to this mail if you have any questions or would like to
> get involved in development.
>
>
>
> -Todd
>
>
>
>
>
>
>
>
>
> --
>
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: where is kudu's dump core located?

2016-04-06 Thread Todd Lipcon
I also put up a patch which should fix the issue here:
http://gerrit.cloudera.org:8080/#/c/2725/
If you're able to rebuild from source, give it a try. It should apply
cleanly on top of 0.7.1.

If not, let me know and I can send you a binary to test out.

-Todd

On Tue, Apr 5, 2016 at 11:21 PM, Todd Lipcon <t...@cloudera.com> wrote:

> BTW, I filed https://issues.apache.org/jira/browse/KUDU-1396 for this
> bug. Thanks for helping us track it down!
>
> On Tue, Apr 5, 2016 at 11:05 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Hi Darren,
>>
>> Thanks again for the core. I got a chance to look at it, and it looks to
>> me like you have a value which is 58KB large which is causing the issue
>> here. In particular, what seems to have happened is that there is an UPDATE
>> delta which is 58KB, and we have a bug in our handling of index blocks when
>> a single record is larger than 32KB. The bug causes an infinite recursion
>> which blows out the stack and crashes with the scenario you saw (if you
>> print out the backtrace all the way to stack frame #81872 you can see the
>> original call to AppendDelta which starts the recursion).
>>
>> Amusingly, there is this debug-level assertion in the code:
>>
>>  size_t est_size = idx_block->EstimateEncodedSize();
>>   if (est_size > options_->index_block_size) {
>> DCHECK(idx_block->Count() > 1)
>>   << "Index block full with only one entry - this would create "
>>   << "an infinite loop";
>> // This index block is full, flush it.
>> BlockPointer index_block_ptr;
>> RETURN_NOT_OK(FinishBlockAndPropagate(level));
>>   }
>>
>> which I wrote way back in October 2012 about 3 weeks into Kudu's initial
>> development. Unfortunately it looks like we never went back to actually
>> address the problem, and in release builds, it causes a crash (rather than
>> an assertion failure in debug builds).
>>
>> I believe given this information we can easily reproduce and fix the
>> issue. Unfortunately it's probably too late for the 0.8.0 release, which is
>> already being voted upon. Do you think you would be able to build from
>> source? If not, we can probably provide you with a patched binary off of
>> trunk at some point if you want to help us verify the fix rather than wait
>> a couple months until the next release.
>>
>> -Todd
>>
>>
>>
>> On Tue, Apr 5, 2016 at 6:33 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>
>>> On Tue, Apr 5, 2016 at 6:27 PM, Darren Hoo <darren@gmail.com> wrote:
>>>
>>>> Thanks Todd,
>>>>
>>>> let me try giving a little more details here.
>>>>
>>>> When I first created the table and loaded about 100k records, kudu
>>>> tablet  server started to crash and very often.
>>>>
>>>> So I suspect that maybe the data file is corrupted and I dump the table
>>>> as parquet file ,
>>>> drop the table, recreate the table, and import the parquet file again.
>>>>
>>>> But after I did that, the tablet server still crashes often utill I
>>>> increase the memory limit to 16GB,
>>>> then the tablet server crashes less often, one time for serveral days.
>>>>
>>>> There's one big STRING column in my table, but the column should not be
>>>> bigger than 4k in size
>>>> as kudu document recommends.
>>>>
>>>
>>> OK, that's definitely an interesting part of the story. Although we
>>> think that 4k strings should be OK, the testing in this kind of workload
>>> has not been as extensive.
>>>
>>> If you are able to share the Parquet file and "create table" command for
>>> the dataset off-list, that would be great. I'll keep it only within our
>>> datacenter and delete it when done debugging.
>>>
>>>
>>>>
>>>> I will try to create a minmal dataset to reproduce the issue, but I am
>>>> not sure I can create one.
>>>>
>>>
>>> Thanks, that would be great if the larger dataset can't be shared.
>>>
>>>
>>>>
>>>> here's the core dump compressed,
>>>>
>>>> http://188.166.175.200/core.90197.bz2
>>>>
>>>> the exact kudu version is : 0.7.1-1.kudu0.7.1.p0.36   (installed from
>>>> parcel)
>>>>
>>>>
>>> OK, thank you. I"m downloading it now and will take a look tonight or
>>> tomorrow.
>>>
>>> -Todd
>>>
>>>
>>>> On Wed, Apr 6, 2016 at 8:59 AM, Todd Lipcon <t...@cloudera.com> wrote:
>>>>
>>>>> Hi Darren,
>>>>>
>>>>> This is interesting. I haven't seen a crash that looks like this, and
>>>>> not sure why it would cause data to disappear either.
>>>>>
>>>>> By any chance do you have some workload that can reproduce the issue?
>>>>> e.g. a particular data set that you are loading that seems to be causing
>>>>> problems?
>>>>>
>>>>> Maybe you can gzip the core file and send it to me off-list if it
>>>>> isn't too large?
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Unsubscribe

2016-02-24 Thread Todd Lipcon
Please email user-unsubscribe@

-Todd

On Wed, Feb 24, 2016 at 10:48 AM, Andrea Ferretti
<ferrettiand...@gmail.com> wrote:
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [KUDU Tablet]unrecoverable crash

2016-02-19 Thread Todd Lipcon
BTW, you may not want to 'rm' but rather move them aside so that you
don't lose data.

-Todd

On Fri, Feb 19, 2016 at 5:28 PM, Todd Lipcon <t...@cloudera.com> wrote:
> If you have a replicated cluster, it's likely that the master already
> re-replicated non-corrupt versions of these tablets to other machines.
>
> It would still be great if you can send one of the WAL directories to
> me off-list so I can take a look and try to understand what's going
> on.
>
> Thanks
> -Todd
>
> On Fri, Feb 19, 2016 at 5:05 PM, Nick Wolf <nickwo...@gmail.com> wrote:
>> I've identified the tablet ID and tried to delete and start the server but
>> it seems like chain reaction. They keep coming one by one with different
>> tablet ids.
>> rm tablet-meta/1c2475126c7c4cc2b82f95bd6af5bdb4
>> rm wals/1c2475126c7c4cc2b82f95bd6af5bdb4
>> rm consensus-meta/1c2475126c7c4cc2b82f95bd6af5bdb4
>>
>> A notable point here is none of the tablet ids that it shows bootstrapping
>> are not appearing in web interface. (http://host:8051/tables)
>>
>> On Fri, Feb 19, 2016 at 12:49 PM, Todd Lipcon <t...@cloudera.com> wrote:
>>>
>>> Hi Nick,
>>>
>>> Are you able to determine the tablet ID that is failing to restart?
>>> The log line indicates that it's thread ID 6285. If you look farther
>>> up the log with 'grep " 6285 " kudu-tserver.INFO', you should see a
>>> log message indicating that that thread is starting to bootstrap a
>>> particular tablet.
>>>
>>> Is this a replicated table, or num_replicas=1? If it's replicated, we
>>> can probably recover by removing the corrupt replica and letting it
>>> grab a new copy from one of the other replicas. Otherwise, we'll have
>>> to do some more serious "surgery" which we can assist you with.
>>>
>>> Either way, see if you can figure out the bad tablet ID. Then, if it's
>>> possible to send a copy of the WAL directory for this tablet to me off
>>> list, I can try to do some post-mortem analysis to see what went
>>> wrong.
>>>
>>> Thanks
>>> -Todd
>>>
>>> On Fri, Feb 19, 2016 at 12:37 PM, Nick Wolf <nickwo...@gmail.com> wrote:
>>> > KUDU Tablet crashed with following fatal error.
>>> >
>>> > F0219 12:15:11.389806  6285 mvcc.cc:542] Check failed: _s.ok() Bad
>>> > status:
>>> > Illegal state: Timestamp: 5963266013874102274 is already committed.
>>> > Current
>>> > Snapshot: MvccSnapshot[committed={T|T < 5963266013874118554 or (T in
>>> > {5963266013874118554})}]
>>> >
>>> > It throws the same fatal error and crashes immediately no matter how
>>> > many
>>> > times i try to restart the service.
>>> >
>>> > Any ideas to get out of this situation? I don't want to lose the data.
>>> >
>>> >
>>> > --Nick
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [KUDU Tablet]unrecoverable crash

2016-02-19 Thread Todd Lipcon
Hi Nick,

Are you able to determine the tablet ID that is failing to restart?
The log line indicates that it's thread ID 6285. If you look farther
up the log with 'grep " 6285 " kudu-tserver.INFO', you should see a
log message indicating that that thread is starting to bootstrap a
particular tablet.

Is this a replicated table, or num_replicas=1? If it's replicated, we
can probably recover by removing the corrupt replica and letting it
grab a new copy from one of the other replicas. Otherwise, we'll have
to do some more serious "surgery" which we can assist you with.

Either way, see if you can figure out the bad tablet ID. Then, if it's
possible to send a copy of the WAL directory for this tablet to me off
list, I can try to do some post-mortem analysis to see what went
wrong.

Thanks
-Todd

On Fri, Feb 19, 2016 at 12:37 PM, Nick Wolf <nickwo...@gmail.com> wrote:
> KUDU Tablet crashed with following fatal error.
>
> F0219 12:15:11.389806  6285 mvcc.cc:542] Check failed: _s.ok() Bad status:
> Illegal state: Timestamp: 5963266013874102274 is already committed. Current
> Snapshot: MvccSnapshot[committed={T|T < 5963266013874118554 or (T in
> {5963266013874118554})}]
>
> It throws the same fatal error and crashes immediately no matter how many
> times i try to restart the service.
>
> Any ideas to get out of this situation? I don't want to lose the data.
>
>
> --Nick
>



-- 
Todd Lipcon
Software Engineer, Cloudera