Re: Performance Question

2016-07-18 Thread Benjamin Kim
Todd,

I upgraded, deleted the table and recreated it again because it was 
unaccessible, and re-introduced the downed tablet server after clearing out all 
kudu directories.

The Spark Streaming job is repopulating again.

Thanks,
Ben


> On Jul 18, 2016, at 10:32 AM, Todd Lipcon  wrote:
> 
> On Mon, Jul 18, 2016 at 10:31 AM, Benjamin Kim  > wrote:
> Todd,
> 
> Thanks for the info. I was going to upgrade after the testing, but now, it 
> looks like I will have to do it earlier than expected.
> 
> I will do the upgrade, then resume.
> 
> OK, sounds good. The upgrade shouldn't invalidate any performance testing or 
> anything -- just fixes this important bug.
> 
> -Todd
> 
> 
>> On Jul 18, 2016, at 10:29 AM, Todd Lipcon > > wrote:
>> 
>> Hi Ben,
>> 
>> Any chance that you are running Kudu 0.9.0 instead of 0.9.1? There's a known 
>> serious bug in 0.9.0 which can cause this kind of corruption.
>> 
>> Assuming that you are running with replication count 3 this time, you should 
>> be able to move aside that tablet metadata file and start the server. It 
>> will recreate a new repaired replica automatically.
>> 
>> -Todd
>> 
>> On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim > > wrote:
>> During my re-population of the Kudu table, I am getting this error trying to 
>> restart a tablet server after it went down. The job that populates this 
>> table has been running for over a week.
>> 
>> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message 
>> of type "kudu.tablet.TabletSuperBlockPB" because it is missing required 
>> fields: rowsets[2324].columns[15].block
>> F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed: _s.ok() 
>> Bad status: IO error: Could not init Tablet Manager: Failed to open tablet 
>> metadata for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to load tablet 
>> metadata for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could not load 
>> tablet metadata from 
>> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable to 
>> parse PB from path: 
>> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
>> *** Check failure stack trace: ***
>> @   0x7d794d  google::LogMessage::Fail()
>> @   0x7d984d  google::LogMessage::SendToLog()
>> @   0x7d7489  google::LogMessage::Flush()
>> @   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
>> @   0x78172b  (unknown)
>> @   0x344d41ed5d  (unknown)
>> @   0x7811d1  (unknown)
>> 
>> Does anyone know what this means?
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Jul 11, 2016, at 10:47 AM, Todd Lipcon >> > wrote:
>>> 
>>> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim >> > wrote:
>>> Todd,
>>> 
>>> I had it at one replica. Do I have to recreate?
>>> 
>>> We don't currently have the ability to "accept data loss" on a tablet (or 
>>> set of tablets). If the machine is gone for good, then currently the only 
>>> easy way to recover is to recreate the table. If this sounds really 
>>> painful, though, maybe we can work up some kind of tool you could use to 
>>> just recreate the missing tablets (with those rows lost).
>>> 
>>> -Todd
>>> 
 On Jul 11, 2016, at 10:37 AM, Todd Lipcon > wrote:
 
 Hey Ben,
 
 Is the table that you're querying replicated? Or was it created with only 
 one replica per tablet?
 
 -Todd
 
 On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim > wrote:
 Over the weekend, a tablet server went down. It’s not coming back up. So, 
 I decommissioned it and removed it from the cluster. Then, I restarted 
 Kudu because I was getting a timeout  exception trying to do counts on the 
 table. Now, when I try again. I get the same error.
 
 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in stage 
 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com 
 ): 
 com.stumbleupon.async.TimeoutException: Timed out after 3ms when 
 joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299, 
 callback=passthrough -> scanner opened -> wakeup thread Executor task 
 launch worker-2, errback=openScanner errback -> passthrough -> wakeup 
 thread Executor task launch worker-2)
 at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
 at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
 at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
 at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
 at 

Re: Performance Question

2016-07-18 Thread Todd Lipcon
On Mon, Jul 18, 2016 at 10:31 AM, Benjamin Kim  wrote:

> Todd,
>
> Thanks for the info. I was going to upgrade after the testing, but now, it
> looks like I will have to do it earlier than expected.
>
> I will do the upgrade, then resume.
>

OK, sounds good. The upgrade shouldn't invalidate any performance testing
or anything -- just fixes this important bug.

-Todd


> On Jul 18, 2016, at 10:29 AM, Todd Lipcon  wrote:
>
> Hi Ben,
>
> Any chance that you are running Kudu 0.9.0 instead of 0.9.1? There's a
> known serious bug in 0.9.0 which can cause this kind of corruption.
>
> Assuming that you are running with replication count 3 this time, you
> should be able to move aside that tablet metadata file and start the
> server. It will recreate a new repaired replica automatically.
>
> -Todd
>
> On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim  wrote:
>
>> During my re-population of the Kudu table, I am getting this error trying
>> to restart a tablet server after it went down. The job that populates this
>> table has been running for over a week.
>>
>> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse
>> message of type "kudu.tablet.TabletSuperBlockPB" because it is missing
>> required fields: rowsets[2324].columns[15].block
>> F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed:
>> _s.ok() Bad status: IO error: Could not init Tablet Manager: Failed to open
>> tablet metadata for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to
>> load tablet metadata for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could
>> not load tablet metadata from
>> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable
>> to parse PB from path:
>> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
>> *** Check failure stack trace: ***
>> @   0x7d794d  google::LogMessage::Fail()
>> @   0x7d984d  google::LogMessage::SendToLog()
>> @   0x7d7489  google::LogMessage::Flush()
>> @   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
>> @   0x78172b  (unknown)
>> @   0x344d41ed5d  (unknown)
>> @   0x7811d1  (unknown)
>>
>> Does anyone know what this means?
>>
>> Thanks,
>> Ben
>>
>>
>> On Jul 11, 2016, at 10:47 AM, Todd Lipcon  wrote:
>>
>> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim 
>> wrote:
>>
>>> Todd,
>>>
>>> I had it at one replica. Do I have to recreate?
>>>
>>
>> We don't currently have the ability to "accept data loss" on a tablet (or
>> set of tablets). If the machine is gone for good, then currently the only
>> easy way to recover is to recreate the table. If this sounds really
>> painful, though, maybe we can work up some kind of tool you could use to
>> just recreate the missing tablets (with those rows lost).
>>
>> -Todd
>>
>>>
>>> On Jul 11, 2016, at 10:37 AM, Todd Lipcon  wrote:
>>>
>>> Hey Ben,
>>>
>>> Is the table that you're querying replicated? Or was it created with
>>> only one replica per tablet?
>>>
>>> -Todd
>>>
>>> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim  wrote:
>>>
 Over the weekend, a tablet server went down. It’s not coming back up.
 So, I decommissioned it and removed it from the cluster. Then, I restarted
 Kudu because I was getting a timeout  exception trying to do counts on the
 table. Now, when I try again. I get the same error.

 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in
 stage 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com):
 com.stumbleupon.async.TimeoutException: Timed out after 3ms when
 joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299,
 callback=passthrough -> scanner opened -> wakeup thread Executor task
 launch worker-2, errback=openScanner errback -> passthrough -> wakeup
 thread Executor task launch worker-2)
 at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
 at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
 at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
 at
 org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at
 org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
 at
 org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
 at
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
 at
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
 at 

Re: Performance Question

2016-07-18 Thread Benjamin Kim
Todd,

Thanks for the info. I was going to upgrade after the testing, but now, it 
looks like I will have to do it earlier than expected.

I will do the upgrade, then resume.

Cheers,
Ben


> On Jul 18, 2016, at 10:29 AM, Todd Lipcon  wrote:
> 
> Hi Ben,
> 
> Any chance that you are running Kudu 0.9.0 instead of 0.9.1? There's a known 
> serious bug in 0.9.0 which can cause this kind of corruption.
> 
> Assuming that you are running with replication count 3 this time, you should 
> be able to move aside that tablet metadata file and start the server. It will 
> recreate a new repaired replica automatically.
> 
> -Todd
> 
> On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim  > wrote:
> During my re-population of the Kudu table, I am getting this error trying to 
> restart a tablet server after it went down. The job that populates this table 
> has been running for over a week.
> 
> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message 
> of type "kudu.tablet.TabletSuperBlockPB" because it is missing required 
> fields: rowsets[2324].columns[15].block
> F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed: _s.ok() 
> Bad status: IO error: Could not init Tablet Manager: Failed to open tablet 
> metadata for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to load tablet 
> metadata for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could not load 
> tablet metadata from 
> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable to 
> parse PB from path: 
> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
> *** Check failure stack trace: ***
> @   0x7d794d  google::LogMessage::Fail()
> @   0x7d984d  google::LogMessage::SendToLog()
> @   0x7d7489  google::LogMessage::Flush()
> @   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
> @   0x78172b  (unknown)
> @   0x344d41ed5d  (unknown)
> @   0x7811d1  (unknown)
> 
> Does anyone know what this means?
> 
> Thanks,
> Ben
> 
> 
>> On Jul 11, 2016, at 10:47 AM, Todd Lipcon > > wrote:
>> 
>> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim > > wrote:
>> Todd,
>> 
>> I had it at one replica. Do I have to recreate?
>> 
>> We don't currently have the ability to "accept data loss" on a tablet (or 
>> set of tablets). If the machine is gone for good, then currently the only 
>> easy way to recover is to recreate the table. If this sounds really painful, 
>> though, maybe we can work up some kind of tool you could use to just 
>> recreate the missing tablets (with those rows lost).
>> 
>> -Todd
>> 
>>> On Jul 11, 2016, at 10:37 AM, Todd Lipcon >> > wrote:
>>> 
>>> Hey Ben,
>>> 
>>> Is the table that you're querying replicated? Or was it created with only 
>>> one replica per tablet?
>>> 
>>> -Todd
>>> 
>>> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim >> > wrote:
>>> Over the weekend, a tablet server went down. It’s not coming back up. So, I 
>>> decommissioned it and removed it from the cluster. Then, I restarted Kudu 
>>> because I was getting a timeout  exception trying to do counts on the 
>>> table. Now, when I try again. I get the same error.
>>> 
>>> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in stage 
>>> 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com 
>>> ): 
>>> com.stumbleupon.async.TimeoutException: Timed out after 3ms when 
>>> joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299, 
>>> callback=passthrough -> scanner opened -> wakeup thread Executor task 
>>> launch worker-2, errback=openScanner errback -> passthrough -> wakeup 
>>> thread Executor task launch worker-2)
>>> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
>>> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
>>> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
>>> at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> at 
>>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
>>> at 
>>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>> at 

Re: Performance Question

2016-07-18 Thread Todd Lipcon
Hi Ben,

Any chance that you are running Kudu 0.9.0 instead of 0.9.1? There's a
known serious bug in 0.9.0 which can cause this kind of corruption.

Assuming that you are running with replication count 3 this time, you
should be able to move aside that tablet metadata file and start the
server. It will recreate a new repaired replica automatically.

-Todd

On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim  wrote:

> During my re-population of the Kudu table, I am getting this error trying
> to restart a tablet server after it went down. The job that populates this
> table has been running for over a week.
>
> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse
> message of type "kudu.tablet.TabletSuperBlockPB" because it is missing
> required fields: rowsets[2324].columns[15].block
> F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed:
> _s.ok() Bad status: IO error: Could not init Tablet Manager: Failed to open
> tablet metadata for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to
> load tablet metadata for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could
> not load tablet metadata from
> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable
> to parse PB from path:
> /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
> *** Check failure stack trace: ***
> @   0x7d794d  google::LogMessage::Fail()
> @   0x7d984d  google::LogMessage::SendToLog()
> @   0x7d7489  google::LogMessage::Flush()
> @   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
> @   0x78172b  (unknown)
> @   0x344d41ed5d  (unknown)
> @   0x7811d1  (unknown)
>
> Does anyone know what this means?
>
> Thanks,
> Ben
>
>
> On Jul 11, 2016, at 10:47 AM, Todd Lipcon  wrote:
>
> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim  wrote:
>
>> Todd,
>>
>> I had it at one replica. Do I have to recreate?
>>
>
> We don't currently have the ability to "accept data loss" on a tablet (or
> set of tablets). If the machine is gone for good, then currently the only
> easy way to recover is to recreate the table. If this sounds really
> painful, though, maybe we can work up some kind of tool you could use to
> just recreate the missing tablets (with those rows lost).
>
> -Todd
>
>>
>> On Jul 11, 2016, at 10:37 AM, Todd Lipcon  wrote:
>>
>> Hey Ben,
>>
>> Is the table that you're querying replicated? Or was it created with only
>> one replica per tablet?
>>
>> -Todd
>>
>> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim  wrote:
>>
>>> Over the weekend, a tablet server went down. It’s not coming back up.
>>> So, I decommissioned it and removed it from the cluster. Then, I restarted
>>> Kudu because I was getting a timeout  exception trying to do counts on the
>>> table. Now, when I try again. I get the same error.
>>>
>>> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in
>>> stage 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com):
>>> com.stumbleupon.async.TimeoutException: Timed out after 3ms when
>>> joining Deferred@712342716(state=PAUSED, result=Deferred@1765902299,
>>> callback=passthrough -> scanner opened -> wakeup thread Executor task
>>> launch worker-2, errback=openScanner errback -> passthrough -> wakeup
>>> thread Executor task launch worker-2)
>>> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
>>> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
>>> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
>>> at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> at
>>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
>>> at
>>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> 

Re: Performance Question

2016-07-18 Thread Benjamin Kim
During my re-population of the Kudu table, I am getting this error trying to 
restart a tablet server after it went down. The job that populates this table 
has been running for over a week.

[libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of 
type "kudu.tablet.TabletSuperBlockPB" because it is missing required fields: 
rowsets[2324].columns[15].block
F0718 17:01:26.783571   468 tablet_server_main.cc:55] Check failed: _s.ok() Bad 
status: IO error: Could not init Tablet Manager: Failed to open tablet metadata 
for tablet: 24637ee6f3e5440181ce3f20b1b298ba: Failed to load tablet metadata 
for tablet id 24637ee6f3e5440181ce3f20b1b298ba: Could not load tablet metadata 
from /mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba: Unable 
to parse PB from path: 
/mnt/data1/kudu/data/tablet-meta/24637ee6f3e5440181ce3f20b1b298ba
*** Check failure stack trace: ***
@   0x7d794d  google::LogMessage::Fail()
@   0x7d984d  google::LogMessage::SendToLog()
@   0x7d7489  google::LogMessage::Flush()
@   0x7da2ef  google::LogMessageFatal::~LogMessageFatal()
@   0x78172b  (unknown)
@   0x344d41ed5d  (unknown)
@   0x7811d1  (unknown)

Does anyone know what this means?

Thanks,
Ben


> On Jul 11, 2016, at 10:47 AM, Todd Lipcon  wrote:
> 
> On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim  > wrote:
> Todd,
> 
> I had it at one replica. Do I have to recreate?
> 
> We don't currently have the ability to "accept data loss" on a tablet (or set 
> of tablets). If the machine is gone for good, then currently the only easy 
> way to recover is to recreate the table. If this sounds really painful, 
> though, maybe we can work up some kind of tool you could use to just recreate 
> the missing tablets (with those rows lost).
> 
> -Todd
> 
>> On Jul 11, 2016, at 10:37 AM, Todd Lipcon > > wrote:
>> 
>> Hey Ben,
>> 
>> Is the table that you're querying replicated? Or was it created with only 
>> one replica per tablet?
>> 
>> -Todd
>> 
>> On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim > > wrote:
>> Over the weekend, a tablet server went down. It’s not coming back up. So, I 
>> decommissioned it and removed it from the cluster. Then, I restarted Kudu 
>> because I was getting a timeout  exception trying to do counts on the table. 
>> Now, when I try again. I get the same error.
>> 
>> 16/07/11 17:32:36 WARN scheduler.TaskSetManager: Lost task 468.3 in stage 
>> 0.0 (TID 603, prod-dc1-datanode167.pdc1i.gradientx.com 
>> ): 
>> com.stumbleupon.async.TimeoutException: Timed out after 3ms when joining 
>> Deferred@712342716(state=PAUSED, result=Deferred@1765902299, 
>> callback=passthrough -> scanner opened -> wakeup thread Executor task launch 
>> worker-2, errback=openScanner errback -> passthrough -> wakeup thread 
>> Executor task launch worker-2)
>> at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1177)
>> at com.stumbleupon.async.Deferred.join(Deferred.java:1045)
>> at org.kududb.client.KuduScanner.nextRows(KuduScanner.java:57)
>> at org.kududb.spark.kudu.RowResultIteratorScala.hasNext(KuduRDD.scala:99)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> at 
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
>> at 
>> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 
>> Does anyone know how to recover from this?
>> 
>> Thanks,
>> Benjamin Kim
>> Data Solutions Architect
>> 
>> [a•mo•bee] (n.) the company defining digital marketing.
>> 

Helping Spread the Word about Apachecon EU 2016

2016-07-18 Thread Sharan Foga
Hi Everyone

I'm forwarding the following message on behalf of Rich Bowen and the 
Apachecon team
==

As you are aware, we are holding ApacheCon in Seville in November. While this 
seems like a long way away, it is critical that we get on people's calendar 
now, so that they can plan, get budget approval, and spread the word to their 
contacts.

Here's how you can help.

If you Tweet, please consider using some of the following sample tweets to get 
the word out:


Save the date. #ApacheCon is coming to Seville, November 14-18 2016. 
http://apachecon.com/

Come join me at @ApacheCon in Seville in November. http://apachecon.com/

#ApacheBigData is the best place to learn what's next in the world of big data. 
November 14-16 in Seville http://apachecon.com/

 @TheASF is 300 projects strong and growing. Come learn about all of them at 
@ApacheCon in Seville - http://apachecon.com/


Follow @ApacheCon and @TheASF, and retweet mentions of ApacheCon, to spread the 
word. 


If you use other social media platforms, share the URLs of the events and their 
CFPs, to collect the broadest possible audience for our events, as well as 
getting the best content:


Big Data: Website: 
http://events.linuxfoundation.org/events/apache-big-data-europe
CFP: http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp


ApacheCon: Website: http://events.linuxfoundation.org/events/apachecon-europe
CFP: http://events.linuxfoundation.org/events/apachecon-europe/program/cfp


And, finally, if your employer benefits from the work that we do at Apache, or 
is looking for the brightest software developers in the industry, encourage 
them to sponsor the event. Sponsorship is available at all levels. Have them 
contact me at e...@apache.org for a prospectus, and I'll make the right 
introductions. Sponsors in the past include … well, everyone. You have to go 
pretty deep into the Forbes Technology list ( 
http://fortune.com/2015/06/13/fortune-500-tech/ ) to find a company that hasn't 
sponsored ApacheCon.

==

Thanks
Sharan