I ran into another problem when trying geomesa+spark:
http://www.geomesa.org/spark/
For some reason, running the geomesa+spark example resulted in the
following error:
scala> queryRdd.count
java.lang.NullPointerException
at
org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.unwrapAuthenticationToken(ConfiguratorBase.java:493)
at
org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:390)
at
org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:668)
at
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)
at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
…
hadoopToken is probably null here:
https://github.com/apache/accumulo/blob/branch-1.7.0/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L493
I suspect it's related to geomesa's way of creating RDD. The following
doesn't seem to be sufficient:
https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/GeoMesaSpark.scala#L69
Because, accumulo is adding hadoop token here in hadoop.mapreduce.Job,
in addition to the Configuration:
https://github.com/apache/accumulo/blob/branch-1.7.0/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AbstractInputFormat.java#L130
Then in spark, NewAPIHadoopRDD of Spark is only taking in a
hadoop.conf.Configuration object, rather than a job:
https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L870
This seems to be a more general issue between accumulo and spark. So,
has anyone tried using spark to access data from a kerberized accumulo
cluster?
Thanks.
-Simon
Thanks.
-Simon
On Sun, Jun 7, 2015 at 3:54 PM, James Hughes <[email protected]> wrote:
> Josh,
>
> Thanks. That's more or less what I expected.
>
> As we work to transition from Mock to MiniAccumulo, I'd want to change from
> spinning up lots of MockInstances to one MiniAccumulo. To understand that
> pattern, do I basically just need to read through test sub-module and the
> test/pom.xml? Are there any other resources I should be checking out?
>
> Cheers,
>
> Jim
>
> On Sun, Jun 7, 2015 at 1:37 PM, Josh Elser <[email protected]> wrote:
>>
>> MiniAccumulo, yes. MockAccumulo, no. In general, we've near completely
>> moved away from MockAccumulo. I wouldn't be surprised if it gets deprecated
>> and removed soon.
>>
>>
>> https://github.com/apache/accumulo/blob/1.7/test/src/test/java/org/apache/accumulo/test/functional/KerberosIT.java
>>
>> Apache Directory provides a MiniKdc that can be used easily w/
>> MiniAccumulo. Many of the integration tests have already been altered to
>> support running w/ or w/o kerberos.
>>
>> James Hughes wrote:
>>>
>>> Hi all,
>>>
>>> For GeoMesa, stats writing is quite secondary and optional, so we can
>>> sort that out as a follow-on to seeing GeoMesa work with Accumulo 1.7.
>>>
>>> I haven't had a chance to read in details yet, so forgive me if this is
>>> covered in the docs. Does either Mock or MiniAccumulo provide enough
>>> hooks to test out Kerberos integration effectively? I suppose I'm
>>> really asking what kind of testing environment a project like GeoMesa
>>> would need to use to test out Accumulo 1.7.
>>>
>>> Even though MockAccumulo has a number of limitations, it is rather
>>> effective in unit tests which can be part of a quick build.
>>>
>>> Thanks,
>>>
>>> Jim
>>>
>>> On Sat, Jun 6, 2015 at 11:14 PM, Xu (Simon) Chen <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Nope, I am running the example as what the readme file suggested:
>>>
>>> java -cp ./target/geomesa-quickstart-1.0-SNAPSHOT.jar
>>> org.geomesa.QuickStart -instanceId somecloud -zookeepers
>>> "zoo1:2181,zoo2:2181,zoo3:2181" -user someuser -password somepwd
>>> -tableName sometable
>>>
>>> I'll raise this question with the geomesa folks, but you're right
>>> that
>>> I can ignore it for now...
>>>
>>> Thanks!
>>> -Simon
>>>
>>>
>>> On Sat, Jun 6, 2015 at 11:00 PM, Josh Elser <[email protected]
>>> <mailto:[email protected]>> wrote:
>>> > Are you running it via `mvn exec:java` by chance or netbeans?
>>> >
>>> >
>>>
>>> http://mail-archives.apache.org/mod_mbox/accumulo-user/201411.mbox/%[email protected]%3E
>>> >
>>> > If that's just a background thread writing in Stats, it might
>>> just be a
>>> > factor of how you're invoking the program and you can ignore it.
>>> I don't
>>> > know enough about the inner-workings of GeoMesa to say one way or
>>> the other.
>>> >
>>> >
>>> > Xu (Simon) Chen wrote:
>>> >>
>>> >> Josh,
>>> >>
>>> >> Everything works well, except for one thing :-)
>>> >>
>>> >> I am running geomesa-quickstart program that ingest some data
>>> and then
>>> >> perform a simple query:
>>> >> https://github.com/geomesa/geomesa-quickstart
>>> >>
>>> >> For some reason, the following error is emitted consistently at
>>> the
>>> >> end of the execution, after outputting the correct result:
>>> >> 15/06/07 00:29:22 INFO zookeeper.ZooCache: Zookeeper error, will
>>> retry
>>> >> java.lang.InterruptedException
>>> >> at java.lang.Object.wait(Native Method)
>>> >> at java.lang.Object.wait(Object.java:503)
>>> >> at
>>> >>
>>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
>>> >> at
>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
>>> >> at
>>> >>
>>> org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:264)
>>> >> at
>>> >>
>>> org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)
>>> >> at
>>> >>
>>> org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)
>>> >> at
>>> >>
>>> org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)
>>> >> at
>>> >>
>>>
>>> org.apache.accumulo.core.client.impl.Tables.getTableState(Tables.java:180)
>>> >> at
>>> >>
>>>
>>> org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:82)
>>> >> at
>>> >>
>>>
>>> org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchWriter(ConnectorImpl.java:128)
>>> >> at
>>> >>
>>>
>>> org.locationtech.geomesa.core.stats.StatWriter$$anonfun$write$2.apply(StatWriter.scala:174)
>>> >> at
>>> >>
>>>
>>> org.locationtech.geomesa.core.stats.StatWriter$$anonfun$write$2.apply(StatWriter.scala:156)
>>> >> at
>>> scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>>> >> at
>>> >>
>>>
>>> org.locationtech.geomesa.core.stats.StatWriter$.write(StatWriter.scala:156)
>>> >> at
>>> >>
>>>
>>> org.locationtech.geomesa.core.stats.StatWriter$.drainQueue(StatWriter.scala:148)
>>> >> at
>>> >>
>>>
>>> org.locationtech.geomesa.core.stats.StatWriter$.run(StatWriter.scala:116)
>>> >> at
>>> >>
>>>
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> >> at
>>> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>> >> at
>>> >>
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>> >> at
>>> >>
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> >> at
>>> >>
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >> at
>>> >>
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >> at java.lang.Thread.run(Thread.java:745)
>>> >>
>>> >> This is more annoying than a real problem. I am new to both
>>> accumulo
>>> >> and geomesa, but I am curious what the problem might be.
>>> >>
>>> >> Thanks!
>>> >> -Simon
>>> >>
>>> >>
>>> >> On Sat, Jun 6, 2015 at 8:01 PM, Josh Elser<[email protected]
>>> <mailto:[email protected]>> wrote:
>>> >>>
>>> >>> Great! Glad to hear it. Please let us know how it works out!
>>> >>>
>>> >>>
>>> >>> Xu (Simon) Chen wrote:
>>> >>>>
>>> >>>> Josh,
>>> >>>>
>>> >>>> You're right again.. Thanks!
>>> >>>>
>>> >>>> My ansible play actually pushed client.conf to all the server
>>> config
>>> >>>> directories, but didn't do anything for the clients, and that's
>>> my
>>> >>>> problem. Now kerberos is working great for me.
>>> >>>>
>>> >>>> Thanks again!
>>> >>>> -Simon
>>> >>>>
>>> >>>> On Sat, Jun 6, 2015 at 5:04 PM, Josh
>>> Elser<[email protected] <mailto:[email protected]>>
>>>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Simon,
>>> >>>>>
>>> >>>>> Did you create a client configuration file (~/.accumulo/config
>>> or
>>> >>>>> $ACCUMULO_CONF_DIR/client.conf)? You need to configure
>>> Accumulo clients
>>> >>>>> to
>>> >>>>> actually use SASL when you're trying to use Kerberos
>>> authentication.
>>> >>>>> Your
>>> >>>>> server is expecting that, but I would venture a guess that
>>> your client
>>> >>>>> isn't.
>>> >>>>>
>>> >>>>> See
>>> >>>>>
>>> >>>>>
>>>
>>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_configuration_3
>>> >>>>>
>>> >>>>>
>>> >>>>> Xu (Simon) Chen wrote:
>>> >>>>>>
>>> >>>>>> Josh,
>>> >>>>>>
>>> >>>>>> Thanks. It makes sense...
>>> >>>>>>
>>> >>>>>> I used a KerberosToken, but my program got stuck when
>>> running the
>>> >>>>>> following:
>>> >>>>>> new ZooKeeperInstance(instance,
>>> zookeepers).getConnector(user,
>>> >>>>>> krbToken)
>>> >>>>>>
>>> >>>>>> It looks like my client is stuck here:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java#L70
>>> >>>>>> failing in the receive part of
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.accumulo.core.client.impl.thrift.ClientService.Client.authenticate().
>>> >>>>>>
>>> >>>>>> On my tservers, I see the following:
>>> >>>>>>
>>> >>>>>> 2015-06-06 18:58:19,616 [server.TThreadPoolServer] ERROR:
>>> Error
>>> >>>>>> occurred during processing of message.
>>> >>>>>> java.lang.RuntimeException:
>>> >>>>>> org.apache.thrift.transport.TTransportException:
>>> >>>>>> java.net.SocketTimeoutException: Read timed out
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
>>> >>>>>> at
>>> java.security.AccessController.doPrivileged(Native
>>> >>>>>> Method)
>>> >>>>>> at
>>> javax.security.auth.Subject.doAs(Subject.java:356)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1622)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>>> >>>>>> at java.lang.Thread.run(Thread.java:745)
>>> >>>>>> Caused by: org.apache.thrift.transport.TTransportException:
>>> >>>>>> java.net.SocketTimeoutException: Read timed out
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>>> >>>>>> at
>>> >>>>>>
>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>>> >>>>>> ... 11 more
>>> >>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
>>> >>>>>> at java.net.SocketInputStream.socketRead0(Native
>>> Method)
>>> >>>>>> at
>>> >>>>>> java.net.SocketInputStream.read(SocketInputStream.java:152)
>>> >>>>>> at
>>> >>>>>> java.net.SocketInputStream.read(SocketInputStream.java:122)
>>> >>>>>> at
>>> >>>>>>
>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>> >>>>>> at
>>> >>>>>>
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>>
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>>> >>>>>> ... 17 more
>>> >>>>>>
>>> >>>>>> Any ideas why?
>>> >>>>>>
>>> >>>>>> Thanks.
>>> >>>>>> -Simon
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Jun 6, 2015 at 2:19 PM, Josh
>>> Elser<[email protected] <mailto:[email protected]>>
>>>
>>> >>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Make sure you read the JavaDoc on DelegationToken:
>>> >>>>>>>
>>> >>>>>>> <snip>
>>> >>>>>>> Obtain a delegation token by calling {@link
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>>
>>> SecurityOperations#getDelegationToken(org.apache.accumulo.core.client.admin.DelegationTokenConfig)}
>>> >>>>>>> </snip>
>>> >>>>>>>
>>> >>>>>>> You cannot create a usable DelegationToken as the client
>>> itself.
>>> >>>>>>>
>>> >>>>>>> Anyways, DelegationTokens are only relevant in cases where
>>> the client
>>> >>>>>>> Kerberos credentials are unavailable. The most common case
>>> is running
>>> >>>>>>> MapReduce jobs. If you are just interacting with Accumulo
>>> through the
>>> >>>>>>> Java
>>> >>>>>>> API, the KerberosToken is all you need to use.
>>> >>>>>>>
>>> >>>>>>> The user-manual likely just needs to be updated. I believe
>>> the
>>> >>>>>>> DelegationTokenConfig was added after I wrote the initial
>>> >>>>>>> documentation.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Xu (Simon) Chen wrote:
>>> >>>>>>>>
>>> >>>>>>>> Hi folks,
>>> >>>>>>>>
>>> >>>>>>>> The latest kerberos doc seems to indicate that
>>> getDelegationToken
>>> >>>>>>>> can
>>> >>>>>>>> be
>>> >>>>>>>> called without any parameters:
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>>
>>> https://github.com/apache/accumulo/blob/1.7/docs/src/main/asciidoc/chapters/kerberos.txt#L410
>>> >>>>>>>>
>>> >>>>>>>> Yet the source code indicates a DelegationTokenConfig
>>> object must be
>>> >>>>>>>> passed in:
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>>
>>> https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/client/admin/SecurityOperations.java#L359
>>> >>>>>>>>
>>> >>>>>>>> Any ideas on how I should construct the
>>> DelegationTokenConfig
>>> >>>>>>>> object?
>>> >>>>>>>>
>>> >>>>>>>> For context, I've been trying to get geomesa to work on my
>>> accumulo
>>> >>>>>>>> 1.7
>>> >>>>>>>> with kerberos turned on. Right now, the code is somewhat
>>> tied to
>>> >>>>>>>> password auth:
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>>
>>> https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloDataStoreFactory.scala#L177
>>> >>>>>>>> My thought is that I should get a KerberosToken first, and
>>> then try
>>> >>>>>>>> generate a DelegationToken, which is passed back for later
>>> >>>>>>>> interactions
>>> >>>>>>>> between geomesa and accumulo.
>>> >>>>>>>>
>>> >>>>>>>> Thanks.
>>> >>>>>>>> -Simon
>>>
>>>
>