I've been trying to find a way around the problem I mentioned. Is this documentation out-dated? https://github.com/apache/accumulo/blob/branch-1.7.0/docs/src/main/asciidoc/chapters/analytics.txt#L72
I don't think setInputInfo is available in 1.7 anymore. On Sun, Jun 7, 2015 at 6:20 PM, Xu (Simon) Chen <[email protected]> wrote: > I ran into another problem when trying geomesa+spark: > http://www.geomesa.org/spark/ > > For some reason, running the geomesa+spark example resulted in the > following error: > scala> queryRdd.count > java.lang.NullPointerException > at > org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.unwrapAuthenticationToken(ConfiguratorBase.java:493) > at > org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:390) > at > org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:668) > at > org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95) > at > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512) > at org.apache.spark.rdd.RDD.count(RDD.scala:1006) > … > > hadoopToken is probably null here: > > https://github.com/apache/accumulo/blob/branch-1.7.0/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L493 > > I suspect it's related to geomesa's way of creating RDD. The following > doesn't seem to be sufficient: > > https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/GeoMesaSpark.scala#L69 > > Because, accumulo is adding hadoop token here in hadoop.mapreduce.Job, > in addition to the Configuration: > > https://github.com/apache/accumulo/blob/branch-1.7.0/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AbstractInputFormat.java#L130 > > Then in spark, NewAPIHadoopRDD of Spark is only taking in a > hadoop.conf.Configuration object, rather than a job: > > https://github.com/apache/spark/blob/v1.3.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L870 > > > This seems to be a more general issue between accumulo and spark. So, > has anyone tried using spark to access data from a kerberized accumulo > cluster? > > Thanks. > -Simon > > Thanks. > -Simon > > > > On Sun, Jun 7, 2015 at 3:54 PM, James Hughes <[email protected]> wrote: > > Josh, > > > > Thanks. That's more or less what I expected. > > > > As we work to transition from Mock to MiniAccumulo, I'd want to change > from > > spinning up lots of MockInstances to one MiniAccumulo. To understand > that > > pattern, do I basically just need to read through test sub-module and the > > test/pom.xml? Are there any other resources I should be checking out? > > > > Cheers, > > > > Jim > > > > On Sun, Jun 7, 2015 at 1:37 PM, Josh Elser <[email protected]> wrote: > >> > >> MiniAccumulo, yes. MockAccumulo, no. In general, we've near completely > >> moved away from MockAccumulo. I wouldn't be surprised if it gets > deprecated > >> and removed soon. > >> > >> > >> > https://github.com/apache/accumulo/blob/1.7/test/src/test/java/org/apache/accumulo/test/functional/KerberosIT.java > >> > >> Apache Directory provides a MiniKdc that can be used easily w/ > >> MiniAccumulo. Many of the integration tests have already been altered to > >> support running w/ or w/o kerberos. > >> > >> James Hughes wrote: > >>> > >>> Hi all, > >>> > >>> For GeoMesa, stats writing is quite secondary and optional, so we can > >>> sort that out as a follow-on to seeing GeoMesa work with Accumulo 1.7. > >>> > >>> I haven't had a chance to read in details yet, so forgive me if this is > >>> covered in the docs. Does either Mock or MiniAccumulo provide enough > >>> hooks to test out Kerberos integration effectively? I suppose I'm > >>> really asking what kind of testing environment a project like GeoMesa > >>> would need to use to test out Accumulo 1.7. > >>> > >>> Even though MockAccumulo has a number of limitations, it is rather > >>> effective in unit tests which can be part of a quick build. > >>> > >>> Thanks, > >>> > >>> Jim > >>> > >>> On Sat, Jun 6, 2015 at 11:14 PM, Xu (Simon) Chen <[email protected] > >>> <mailto:[email protected]>> wrote: > >>> > >>> Nope, I am running the example as what the readme file suggested: > >>> > >>> java -cp ./target/geomesa-quickstart-1.0-SNAPSHOT.jar > >>> org.geomesa.QuickStart -instanceId somecloud -zookeepers > >>> "zoo1:2181,zoo2:2181,zoo3:2181" -user someuser -password somepwd > >>> -tableName sometable > >>> > >>> I'll raise this question with the geomesa folks, but you're right > >>> that > >>> I can ignore it for now... > >>> > >>> Thanks! > >>> -Simon > >>> > >>> > >>> On Sat, Jun 6, 2015 at 11:00 PM, Josh Elser <[email protected] > >>> <mailto:[email protected]>> wrote: > >>> > Are you running it via `mvn exec:java` by chance or netbeans? > >>> > > >>> > > >>> > >>> > http://mail-archives.apache.org/mod_mbox/accumulo-user/201411.mbox/%[email protected]%3E > >>> > > >>> > If that's just a background thread writing in Stats, it might > >>> just be a > >>> > factor of how you're invoking the program and you can ignore it. > >>> I don't > >>> > know enough about the inner-workings of GeoMesa to say one way > or > >>> the other. > >>> > > >>> > > >>> > Xu (Simon) Chen wrote: > >>> >> > >>> >> Josh, > >>> >> > >>> >> Everything works well, except for one thing :-) > >>> >> > >>> >> I am running geomesa-quickstart program that ingest some data > >>> and then > >>> >> perform a simple query: > >>> >> https://github.com/geomesa/geomesa-quickstart > >>> >> > >>> >> For some reason, the following error is emitted consistently at > >>> the > >>> >> end of the execution, after outputting the correct result: > >>> >> 15/06/07 00:29:22 INFO zookeeper.ZooCache: Zookeeper error, > will > >>> retry > >>> >> java.lang.InterruptedException > >>> >> at java.lang.Object.wait(Native Method) > >>> >> at java.lang.Object.wait(Object.java:503) > >>> >> at > >>> >> > >>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) > >>> >> at > >>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) > >>> >> at > >>> >> > >>> org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:264) > >>> >> at > >>> >> > >>> org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162) > >>> >> at > >>> >> > >>> org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289) > >>> >> at > >>> >> > >>> org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238) > >>> >> at > >>> >> > >>> > >>> > org.apache.accumulo.core.client.impl.Tables.getTableState(Tables.java:180) > >>> >> at > >>> >> > >>> > >>> > org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:82) > >>> >> at > >>> >> > >>> > >>> > org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchWriter(ConnectorImpl.java:128) > >>> >> at > >>> >> > >>> > >>> > org.locationtech.geomesa.core.stats.StatWriter$$anonfun$write$2.apply(StatWriter.scala:174) > >>> >> at > >>> >> > >>> > >>> > org.locationtech.geomesa.core.stats.StatWriter$$anonfun$write$2.apply(StatWriter.scala:156) > >>> >> at > >>> scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > >>> >> at > >>> >> > >>> > >>> > org.locationtech.geomesa.core.stats.StatWriter$.write(StatWriter.scala:156) > >>> >> at > >>> >> > >>> > >>> > org.locationtech.geomesa.core.stats.StatWriter$.drainQueue(StatWriter.scala:148) > >>> >> at > >>> >> > >>> > >>> > org.locationtech.geomesa.core.stats.StatWriter$.run(StatWriter.scala:116) > >>> >> at > >>> >> > >>> > >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >>> >> at > >>> >> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > >>> >> at > >>> >> > >>> > >>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > >>> >> at > >>> >> > >>> > >>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > >>> >> at > >>> >> > >>> > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> >> at > >>> >> > >>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> >> at java.lang.Thread.run(Thread.java:745) > >>> >> > >>> >> This is more annoying than a real problem. I am new to both > >>> accumulo > >>> >> and geomesa, but I am curious what the problem might be. > >>> >> > >>> >> Thanks! > >>> >> -Simon > >>> >> > >>> >> > >>> >> On Sat, Jun 6, 2015 at 8:01 PM, Josh Elser< > [email protected] > >>> <mailto:[email protected]>> wrote: > >>> >>> > >>> >>> Great! Glad to hear it. Please let us know how it works out! > >>> >>> > >>> >>> > >>> >>> Xu (Simon) Chen wrote: > >>> >>>> > >>> >>>> Josh, > >>> >>>> > >>> >>>> You're right again.. Thanks! > >>> >>>> > >>> >>>> My ansible play actually pushed client.conf to all the server > >>> config > >>> >>>> directories, but didn't do anything for the clients, and > that's > >>> my > >>> >>>> problem. Now kerberos is working great for me. > >>> >>>> > >>> >>>> Thanks again! > >>> >>>> -Simon > >>> >>>> > >>> >>>> On Sat, Jun 6, 2015 at 5:04 PM, Josh > >>> Elser<[email protected] <mailto:[email protected]>> > >>> > >>> >>>> wrote: > >>> >>>>> > >>> >>>>> Simon, > >>> >>>>> > >>> >>>>> Did you create a client configuration file > (~/.accumulo/config > >>> or > >>> >>>>> $ACCUMULO_CONF_DIR/client.conf)? You need to configure > >>> Accumulo clients > >>> >>>>> to > >>> >>>>> actually use SASL when you're trying to use Kerberos > >>> authentication. > >>> >>>>> Your > >>> >>>>> server is expecting that, but I would venture a guess that > >>> your client > >>> >>>>> isn't. > >>> >>>>> > >>> >>>>> See > >>> >>>>> > >>> >>>>> > >>> > >>> > http://accumulo.apache.org/1.7/accumulo_user_manual.html#_configuration_3 > >>> >>>>> > >>> >>>>> > >>> >>>>> Xu (Simon) Chen wrote: > >>> >>>>>> > >>> >>>>>> Josh, > >>> >>>>>> > >>> >>>>>> Thanks. It makes sense... > >>> >>>>>> > >>> >>>>>> I used a KerberosToken, but my program got stuck when > >>> running the > >>> >>>>>> following: > >>> >>>>>> new ZooKeeperInstance(instance, > >>> zookeepers).getConnector(user, > >>> >>>>>> krbToken) > >>> >>>>>> > >>> >>>>>> It looks like my client is stuck here: > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java#L70 > >>> >>>>>> failing in the receive part of > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.accumulo.core.client.impl.thrift.ClientService.Client.authenticate(). > >>> >>>>>> > >>> >>>>>> On my tservers, I see the following: > >>> >>>>>> > >>> >>>>>> 2015-06-06 18:58:19,616 [server.TThreadPoolServer] ERROR: > >>> Error > >>> >>>>>> occurred during processing of message. > >>> >>>>>> java.lang.RuntimeException: > >>> >>>>>> org.apache.thrift.transport.TTransportException: > >>> >>>>>> java.net.SocketTimeoutException: Read timed out > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48) > >>> >>>>>> at > >>> java.security.AccessController.doPrivileged(Native > >>> >>>>>> Method) > >>> >>>>>> at > >>> javax.security.auth.Subject.doAs(Subject.java:356) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1622) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > >>> >>>>>> at java.lang.Thread.run(Thread.java:745) > >>> >>>>>> Caused by: org.apache.thrift.transport.TTransportException: > >>> >>>>>> java.net.SocketTimeoutException: Read timed out > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > >>> >>>>>> at > >>> >>>>>> > >>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > >>> >>>>>> ... 11 more > >>> >>>>>> Caused by: java.net.SocketTimeoutException: Read timed out > >>> >>>>>> at java.net.SocketInputStream.socketRead0(Native > >>> Method) > >>> >>>>>> at > >>> >>>>>> java.net.SocketInputStream.read(SocketInputStream.java:152) > >>> >>>>>> at > >>> >>>>>> java.net.SocketInputStream.read(SocketInputStream.java:122) > >>> >>>>>> at > >>> >>>>>> > >>> java.io.BufferedInputStream.read1(BufferedInputStream.java:273) > >>> >>>>>> at > >>> >>>>>> > >>> java.io.BufferedInputStream.read(BufferedInputStream.java:334) > >>> >>>>>> at > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> > >>> > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > >>> >>>>>> ... 17 more > >>> >>>>>> > >>> >>>>>> Any ideas why? > >>> >>>>>> > >>> >>>>>> Thanks. > >>> >>>>>> -Simon > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> On Sat, Jun 6, 2015 at 2:19 PM, Josh > >>> Elser<[email protected] <mailto:[email protected]>> > >>> > >>> >>>>>> wrote: > >>> >>>>>>> > >>> >>>>>>> Make sure you read the JavaDoc on DelegationToken: > >>> >>>>>>> > >>> >>>>>>> <snip> > >>> >>>>>>> Obtain a delegation token by calling {@link > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> > >>> > >>> > SecurityOperations#getDelegationToken(org.apache.accumulo.core.client.admin.DelegationTokenConfig)} > >>> >>>>>>> </snip> > >>> >>>>>>> > >>> >>>>>>> You cannot create a usable DelegationToken as the client > >>> itself. > >>> >>>>>>> > >>> >>>>>>> Anyways, DelegationTokens are only relevant in cases where > >>> the client > >>> >>>>>>> Kerberos credentials are unavailable. The most common case > >>> is running > >>> >>>>>>> MapReduce jobs. If you are just interacting with Accumulo > >>> through the > >>> >>>>>>> Java > >>> >>>>>>> API, the KerberosToken is all you need to use. > >>> >>>>>>> > >>> >>>>>>> The user-manual likely just needs to be updated. I believe > >>> the > >>> >>>>>>> DelegationTokenConfig was added after I wrote the initial > >>> >>>>>>> documentation. > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> Xu (Simon) Chen wrote: > >>> >>>>>>>> > >>> >>>>>>>> Hi folks, > >>> >>>>>>>> > >>> >>>>>>>> The latest kerberos doc seems to indicate that > >>> getDelegationToken > >>> >>>>>>>> can > >>> >>>>>>>> be > >>> >>>>>>>> called without any parameters: > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> > >>> > https://github.com/apache/accumulo/blob/1.7/docs/src/main/asciidoc/chapters/kerberos.txt#L410 > >>> >>>>>>>> > >>> >>>>>>>> Yet the source code indicates a DelegationTokenConfig > >>> object must be > >>> >>>>>>>> passed in: > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> > >>> > https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/client/admin/SecurityOperations.java#L359 > >>> >>>>>>>> > >>> >>>>>>>> Any ideas on how I should construct the > >>> DelegationTokenConfig > >>> >>>>>>>> object? > >>> >>>>>>>> > >>> >>>>>>>> For context, I've been trying to get geomesa to work on > my > >>> accumulo > >>> >>>>>>>> 1.7 > >>> >>>>>>>> with kerberos turned on. Right now, the code is somewhat > >>> tied to > >>> >>>>>>>> password auth: > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> > >>> > https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloDataStoreFactory.scala#L177 > >>> >>>>>>>> My thought is that I should get a KerberosToken first, > and > >>> then try > >>> >>>>>>>> generate a DelegationToken, which is passed back for > later > >>> >>>>>>>> interactions > >>> >>>>>>>> between geomesa and accumulo. > >>> >>>>>>>> > >>> >>>>>>>> Thanks. > >>> >>>>>>>> -Simon > >>> > >>> > > >
