Generally, it's not a good idea to assume that you can always locate the correct ClientConfiguration from the local filesystem.

For example, YARN nodes might not have Accumulo installed, or might even point to the wrong Accumulo instance. The Job's Configuration serves as the de-facto place for _all_ information that your Job needs to perform its work.

Can you try calling AccumuloInputFormat.setZooKeeperInstance(Job, ClientConfiguration) instead?

Great work tracking this down, Simon!

Xu (Simon) Chen wrote:
Ah, I found the problem...

In the hadoop chain of events, this is called eventually, because
clientConfigString is not null (containing instance.name and
instance.zookeeper.host):
https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L382

The deserialize function unfortunately doesn't load the default
options, therefore left out the sasl thing from ~/.accumulo/config
https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L235

Would it be reasonable for deserialize to load the default settings?

-Simon


On Fri, Jun 12, 2015 at 11:34 AM, Josh Elser<josh.el...@gmail.com>  wrote:
Just be careful with the mapreduce classes. I wouldn't be surprised if we
try to avoid any locally installed client.conf in MapReduce (using only the
ClientConfiguration stored inside the Job).

Will wait to hear back from you :)

Xu (Simon) Chen wrote:
Emm.. I have ~/.accumulo/config with "instance.rpc.sasl.enabled=true".
That property is indeed populated to ClientConfiguration the first time
- that's why I said the token worked initially.

Apparently, in the Hadoop portion that property is not set, as I added
some debug message to ZooKeeperInstance class. I think that's likely the
issue.

So the zookeeper instance is created in the following sequence:

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L341

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/InputConfigurator.java#L671

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L361

The getClientConfiguration function calls getDefaultSearchPath()
eventually, so my ~/.accumulo/config should be searched. I think we are
close to the root cause... Will update when I find out more.

Thanks!
-Simon

On Thu, Jun 11, 2015 at 11:28 PM, Josh Elser<josh.el...@gmail.com>  wrote:
  >  Are you sure that the spark tasks have the proper
ClientConfiguration? They
  >  need to have instance.rpc.sasl.enabled. I believe you should be able
to set
  >  this via the AccumuloInputFormat
  >
  >  You can turn up logging org.apache.accumulo.core.client=TRACE and/or
set the
  >  system property -Dsun.security.krb5.debug=true to get some more
information
  >  as to why the authentication is failing.
  >
  >
  >  Xu (Simon) Chen wrote:
  >>
  >>  Josh,
  >>
  >>  I am using this function:
  >>
  >>
  >>

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L106
  >>
  >>  If I pass in a KerberosToken, it's stuck at line 111; if I pass in a
  >>  delegation token, the setConnectorInfo function finishes fine.
  >>
  >>  But when I do something like queryRDD.count, spark eventually calls
  >>  HadoopRDD.getPartitions, which calls the following and get stuck in
  >>  the last authenticate() function:
  >>
  >>

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L621
  >>
  >>

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/mapred/AbstractInputFormat.java#L348
  >>
  >>

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/ZooKeeperInstance.java#L248
  >>
  >>

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java#L70
  >>
  >>  Which essentially the same place where it would be stuck with
  >>  KerberosToken.
  >>
  >>  -Simon
  >>
  >>  On Thu, Jun 11, 2015 at 9:41 PM, Josh Elser<josh.el...@gmail.com>
wrote:
  >>>
  >>>  What are the Accumulo methods that you are calling and what is the
error
  >>>  you
  >>>  are seeing?
  >>>
  >>>  A KerberosToken cannot be used in a MapReduce job which is why a
  >>>  DelegationToken is automatically retrieved. You should still be able
to
  >>>  provide your own DelegationToken -- if that doesn't work, that's a
bug.
  >>>
  >>>  Xu (Simon) Chen wrote:
  >>>>
  >>>>  I actually added a flag such that I can pass in either a
KerberosToken
  >>>>  or a DelegationTokenImpl to accumulo.
  >>>>
  >>>>  Actually when a KerberosToken is passed in, accumulo converts it to
a
  >>>>  DelegationToken - the conversion is where I am having trouble. I
tried
  >>>>  passing in a delegation token directly to bypass the conversion, but
a
  >>>>  similar problem happens, that I am stuck at authenticate on the
client
  >>>>  side, and server side outputs the same output...
  >>>>
  >>>>  On Thursday, June 11, 2015, Josh Elser<josh.el...@gmail.com
  >>>>  <mailto:josh.el...@gmail.com>>   wrote:
  >>>>
  >>>>       Keep in mind that the authentication path for DelegationToken
  >>>>       (mapreduce) and KerberosToken are completely different.
  >>>>
  >>>>       Since most mapreduce jobs have multiple mappers (or reducers),
I
  >>>>       expect we would have run into the case that the same
  >>>>  DelegationToken
  >>>>       was used multiple times. It would still be good to narrow
down the
  >>>>       scope of the problem.
  >>>>
  >>>>       Xu (Simon) Chen wrote:
  >>>>
  >>>>           Thanks Josh...
  >>>>
  >>>>           I tested this in scala REPL, and called
  >>>>           DataStoreFinder.getDataStore()
  >>>>           multiple times, each time it seems to be reusing the same
  >>>>           KerberosToken object, and it works fine each time.
  >>>>
  >>>>           So my problem only happens when the token is used in
accumulo's
  >>>>           mapred
  >>>>           package. Weird..
  >>>>
  >>>>           -Simon
  >>>>
  >>>>
  >>>>           On Thu, Jun 11, 2015 at 5:29 PM, Josh
  >>>>           Elser<josh.el...@gmail.com>    wrote:
  >>>>
  >>>>               Simon,
  >>>>
  >>>>               Can you reproduce this in plain-jane Java code? I don't
  >>>>  know
  >>>>               enough about
  >>>>               spark/scala, much less what Geomesa is actually do,
to know
  >>>>               what the issue
  >>>>               is.
  >>>>
  >>>>               Also, which token are you referring to: A
KerberosToken or
  >>>>  a
  >>>>               DelegationToken? Either of them should be usable as
many
  >>>>               times as you'd like
  >>>>               (given the underlying credentials are still available
for
  >>>>  KT
  >>>>               or the DT token
  >>>>               hasn't yet expired).
  >>>>
  >>>>
  >>>>               Xu (Simon) Chen wrote:
  >>>>
  >>>>                   Folks,
  >>>>
  >>>>                   I am working on geomesa+accumulo+spark
integration. For
  >>>>                   some reason, I
  >>>>                   found that the same token cannot be used to
  >>>>  authenticate
  >>>>                   twice.
  >>>>
  >>>>                   The workflow is that geomesa would try to create a
  >>>>                   hadoop rdd, during
  >>>>                   which it tries to create an AccumuloDataStore:
  >>>>
  >>>>
  >>>>
  >>>>

https://github.com/locationtech/geomesa/blob/master/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/GeoMesaSpark.scala#L81
  >>>>
  >>>>                   During this process, a ZooKeeperInstance is
created:
  >>>>
  >>>>
  >>>>
  >>>>

https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloDataStoreFactory.scala#L177
  >>>>                   I modified geomesa such that it would use kerberos
to
  >>>>                   authenticate
  >>>>                   here. This step works fine.
  >>>>
  >>>>                   But next, geomesa calls
  >>>>  ConfigurationBase.setConnectorInfo:
  >>>>
  >>>>
  >>>>
  >>>>

https://github.com/locationtech/geomesa/blob/rc7_a1.7_h2.5/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/GeoMesaSpark.scala#L69
  >>>>
  >>>>                   This is using the same token and the same zookeeper
  >>>>  URI,
  >>>>                   for some
  >>>>                   reason it is stuck on spark-shell, and the
following is
  >>>>                   outputted on
  >>>>                   tserver side:
  >>>>
  >>>>                   2015-06-06 18:58:19,616 [server.TThreadPoolServer]
  >>>>                   ERROR: Error
  >>>>                   occurred during processing of message.
  >>>>                   java.lang.RuntimeException:
  >>>>                   org.apache.thrift.transport.TTransportException:
  >>>>  java.net
<http://java.net><http://java.net>.SocketTimeoutException: Read

  >>>>
  >>>>                   timed out
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
  >>>>                              at
  >>>>                   java.security.AccessController.doPrivileged(Native
  >>>>  Method)
  >>>>                              at
  >>>>                   javax.security.auth.Subject.doAs(Subject.java:356)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1622)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
  >>>>                              at
  >>>>
  >>>>
  >>>>

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  >>>>                              at
  >>>>
  >>>>
  >>>>

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  >>>>                              at
  >>>>
  >>>>
  >>>>
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
  >>>>                              at java.lang.Thread.run(Thread.java:745)
  >>>>                   Caused by:
  >>>>  org.apache.thrift.transport.TTransportException:
  >>>>  java.net
<http://java.net><http://java.net>.SocketTimeoutException: Read

  >>>>                   timed out
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
  >>>>                              at
  >>>>
  >>>>  org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
  >>>>                              at
  >>>>
  >>>>
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
  >>>>                              ... 11 more
  >>>>                   Caused by: java.net<http://java.net>
  >>>>  <http://java.net>.SocketTimeoutException: Read timed
  >>>>  out
  >>>>                              at
  >>>>                   java.net.SocketInputStream.socketRead0(Native
Method)
  >>>>                              at
  >>>>
  >>>>  java.net.SocketInputStream.read(SocketInputStream.java:152)
  >>>>                              at
  >>>>
  >>>>  java.net.SocketInputStream.read(SocketInputStream.java:122)
  >>>>                              at
  >>>>
  >>>>  java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
  >>>>                              at
  >>>>
  >>>>  java.io.BufferedInputStream.read(BufferedInputStream.java:334)
  >>>>                              at
  >>>>
  >>>>
  >>>>

org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
  >>>>                              ... 17 more
  >>>>
  >>>>                   Any idea why?
  >>>>
  >>>>                   Thanks.
  >>>>                   -Simon
  >>>>
  >

Reply via email to