Re: Problems connecting form Spark

2018-03-06 Thread William Berkeley
In each case the problem is that some part of your application can't find
the leader master of the Kudu cluster:

org.apache.kudu.client.NoLeaderFoundException: Master config (*172.17.0.43:7077
*) has no leader.
org.apache.kudu.client.NoLeaderFoundException: Master config (
*localhost:7051*) has no leader.

I think you're seeing these errors for two reasons:

1. Are you using multi-master? The first exception shows you specified one
remote master. If your cluster has multiple masters, you should specify all
of them. If you specify only one, and it's not the leader master, then
connecting to it will fail. You can check which master is the leader by
going to the /masters page on the web ui of any master.

2. In the "standalone" case, the Spark tasks are being distributed to
executors and fail there:

Lost task 1.0 in stage 0.0 (TID 1, tt-slave-2.novalocal, executor 1)

You've specified the master address as localhost. That address is passed
as-is to executors. Any task on an executor that doesn't have the leader
master locally at port 7051 will fail to connect to the leader master.
Getting the column names doesn't fail as that doesn't generate tasks sent
to remote executors.

I make this mistake all the time while playing with kudu-spark :)

-Will





On Mon, Mar 5, 2018 at 4:14 PM, Mac Noland  wrote:

> Any chance you can try spark2-shell with Kudu 1.6 and then re-try your
> tests?
>
> spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.6.0
>
> On Fri, Mar 2, 2018 at 5:02 AM, Saúl Nogueras  wrote:
>
>> I cannot properly connect to Kudu from Spark, error says “Kudu master has
>> no leader”
>>
>>- CDH 5.14
>>- Kudu 1.6
>>- Spark 1.6.0 standalone and 2.2.0
>>
>> When I use Impala in HUE to create and query kudu tables, it works
>> flawlessly.
>>
>> However, connecting from Spark throws some errors I cannot decipher.
>>
>> I have tried using both pyspark and spark-shell. With spark shell I had
>> to use spark 1.6 instead of 2.2 because some maven dependencies problems,
>> that I have localized but not been able to fix. More info here.
>> --
>> Case 1: using pyspark2 (Spark 2.2.0)
>>
>> $ pyspark2 --master yarn --jars 
>> /opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
>>
>> > df = 
>> > sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077",
>> >  "kudu.table":"impala::default.test"}).load()
>>
>> 18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving response 
>> from 172.17.0.43:7077
>> org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] 
>> encountered a read timeout; closing the channel
>> at 
>> org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>> at 
>> org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
>> at 
>> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>> at 
>> 

Re: Problems connecting form Spark

2018-03-05 Thread Mac Noland
Any chance you can try spark2-shell with Kudu 1.6 and then re-try your
tests?

spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.6.0

On Fri, Mar 2, 2018 at 5:02 AM, Saúl Nogueras  wrote:

> I cannot properly connect to Kudu from Spark, error says “Kudu master has
> no leader”
>
>- CDH 5.14
>- Kudu 1.6
>- Spark 1.6.0 standalone and 2.2.0
>
> When I use Impala in HUE to create and query kudu tables, it works
> flawlessly.
>
> However, connecting from Spark throws some errors I cannot decipher.
>
> I have tried using both pyspark and spark-shell. With spark shell I had to
> use spark 1.6 instead of 2.2 because some maven dependencies problems, that
> I have localized but not been able to fix. More info here.
> --
> Case 1: using pyspark2 (Spark 2.2.0)
>
> $ pyspark2 --master yarn --jars 
> /opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar
>
> > df = 
> > sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077",
> >  "kudu.table":"impala::default.test"}).load()
>
> 18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving response from 
> 172.17.0.43:7077
> org.apache.kudu.client.RecoverableException: [peer master-172.17.0.43:7077] 
> encountered a read timeout; closing the channel
> at 
> org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
> at 
> org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
> at 
> org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
> at 
> org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at 
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
> at 
> org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.(ReadTimeoutHandler.java:84)
> at 
> org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
> at org.apache.kudu.client.Connection.(Connection.java:199)
> at 
> org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
> at 
> org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
> at 
> org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
> at 
> org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
> at 
> org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
> at 
> 

Problems connecting form Spark

2018-03-02 Thread Saúl Nogueras
I cannot properly connect to Kudu from Spark, error says “Kudu master has
no leader”

   - CDH 5.14
   - Kudu 1.6
   - Spark 1.6.0 standalone and 2.2.0

When I use Impala in HUE to create and query kudu tables, it works
flawlessly.

However, connecting from Spark throws some errors I cannot decipher.

I have tried using both pyspark and spark-shell. With spark shell I had to
use spark 1.6 instead of 2.2 because some maven dependencies problems, that
I have localized but not been able to fix. More info here.
--
Case 1: using pyspark2 (Spark 2.2.0)

$ pyspark2 --master yarn --jars
/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/kudu/kudu-spark2_2.11.jar

> df = 
> sqlContext.read.format('org.apache.kudu.spark.kudu').options(**{"kudu.master":"172.17.0.43:7077",
>  "kudu.table":"impala::default.test"}).load()

18/03/02 10:23:27 WARN client.ConnectToCluster: Error receiving
response from 172.17.0.43:7077
org.apache.kudu.client.RecoverableException: [peer
master-172.17.0.43:7077] encountered a read timeout; closing the
channel
at 
org.apache.kudu.client.Connection.exceptionCaught(Connection.java:412)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at org.apache.kudu.client.Connection.handleUpstream(Connection.java:239)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:536)
at 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:236)
at 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask$1.run(ReadTimeoutHandler.java:276)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutException
at 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.(ReadTimeoutHandler.java:84)
at 
org.apache.kudu.client.Connection$ConnectionPipeline.init(Connection.java:782)
at org.apache.kudu.client.Connection.(Connection.java:199)
at 
org.apache.kudu.client.ConnectionCache.getConnection(ConnectionCache.java:133)
at 
org.apache.kudu.client.AsyncKuduClient.newRpcProxy(AsyncKuduClient.java:248)
at 
org.apache.kudu.client.AsyncKuduClient.newMasterRpcProxy(AsyncKuduClient.java:272)
at 
org.apache.kudu.client.ConnectToCluster.run(ConnectToCluster.java:157)
at 
org.apache.kudu.client.AsyncKuduClient.getMasterTableLocationsPB(AsyncKuduClient.java:1350)
at 
org.apache.kudu.client.AsyncKuduClient.exportAuthenticationCredentials(AsyncKuduClient.java:651)
at 
org.apache.kudu.client.KuduClient.exportAuthenticationCredentials(KuduClient.java:293)
at 
org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:97)
at 
org.apache.kudu.spark.kudu.KuduContext$$anon$1.run(KuduContext.scala:96)
at java.security.AccessController.doPrivileged(Native Method)
at