Spark connecting to Hive in another EMR cluster

2016-06-24 Thread Dave Maughan
Hi,

We're trying to get a Spark (1.6.1) job running on EMR (4.7.1) that's
connecting to the Hive metastore in another EMR cluster. A simplification
of what we're doing is below

val sparkConf = new SparkConf().setAppName("MyApp")
val sc = new SparkContext(sparkConf)
val sqlContext = new HiveContext(sc)

sqlContext.setConf("hive.metastore.uris", "thrift://other.emr.master:9083")

sqlContext.sql("use db1") //any statement using sql method.

sqlContext.table("db2.table1").show

The above works but if we remove the sql method then the table.show fails
with NoSuchTableException indicating it hasn't connected to the external
metastore.
So it seems something in the sql method code path does something to connect
to the metastore that table doesn't.

Does this sound familiar to anyone? Is there something we can do to avoid
having to call sql just to force it to connect/configure correctly?

Thanks,
Dave


Re: Spark SQL - Encoders - case class

2016-06-06 Thread Dave Maughan
Hi,

Thanks for the quick replies. I've tried those suggestions but Eclipse is
showing:

*Unable** to find encoder for type stored in a Dataset.  Primitive
types (Int, String, etc) and Product types (case classes) are supported by
importing sqlContext.implicits._  Support for serializing other types will
be added in future.*


Thanks

- Dave


Spark SQL - Encoders - case class

2016-06-06 Thread Dave Maughan
Hi,

I've figured out how to select data from a remote Hive instance and encode
the DataFrame -> Dataset using a Java POJO class:

TestHive.sql("select foo_bar as `fooBar` from table1"
).as(Encoders.bean(classOf[Table1])).show()

However, I'm struggling to find out to do the equivalent in Scala if Table1
is a case class. Could someone please point me in the right direction?

Thanks
- Dave


Re: spark 1.6.0 connect to hive metastore

2016-03-09 Thread Dave Maughan
Hi,

We're having a similar issue. We have a standalone cluster running 1.5.2
with Hive working fine having dropped hive-site.xml into the conf folder.
We've just updated to 1.6.0, using the same configuration. Now when
starting a spark-shell we get the following:

java.lang.RuntimeException: java.lang.RuntimeException: Unable to
instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreCli
ent
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at
org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:194)
at
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at
org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at
org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at
org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:330)
at
org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
at
org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at
org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
at $iwC$$iwC.(:15)
at $iwC.(:24)
at (:26)

On stepping though the code and enabling debug it shows that
hive.metastore.uris is not set:

DEBUG ClientWrapper: Hive Config: hive.metastore.uris=

..So it looks like it's not finding hive-site.xml? Weirdly, if I remove
hive-site.xml the exception does not occur which implies that it WAS on the
classpath...

Dave



On Tue, 9 Feb 2016 at 22:26 Koert Kuipers  wrote:

> i do not have phoenix, but i wonder if its something related. will check
> my classpaths
>
> On Tue, Feb 9, 2016 at 5:00 PM, Benjamin Kim  wrote:
>
>> I got the same problem when I added the Phoenix plugin jar in the driver
>> and executor extra classpaths. Do you have those set too?
>>
>
>> On Feb 9, 2016, at 1:12 PM, Koert Kuipers  wrote:
>>
>> yes its not using derby i think: i can see the tables in my actual hive
>> metastore.
>>
>> i was using a symlink to /etc/hive/conf/hive-site.xml for my
>> hive-site.xml which has a lot more stuff than just hive.metastore.uris
>>
>> let me try your approach
>>
>>
>>
>> On Tue, Feb 9, 2016 at 3:57 PM, Alexandr Dzhagriev 
>> wrote:
>>
>>> I'm using spark 1.6.0, hive 1.2.1 and there is just one property in the
>>> hive-site.xml hive.metastore.uris Works for me. Can you check in the
>>> logs, that when the HiveContext is created it connects to the correct uri
>>> and doesn't use derby.
>>>
>>> Cheers, Alex.
>>>
>>> On Tue, Feb 9, 2016 at 9:39 PM, Koert Kuipers  wrote:
>>>
 hey thanks. hive-site is on classpath in conf directory

 i currently got it to work by changing this hive setting in
 hive-site.xml:
 hive.metastore.schema.verification=true
 to
 hive.metastore.schema.verification=false

 this feels like a hack, because schema verification is a good thing i
 would assume?

 On Tue, Feb 9, 2016 at 3:25 PM, Alexandr Dzhagriev 
 wrote:

> Hi Koert,
>
> As far as I can see you are using derby:
>
>  Using direct SQL, underlying DB is DERBY
>
> not mysql, which is used for the metastore. That means, spark couldn't
> find hive-site.xml on your classpath. Can you check that, please?
>
> Thanks, Alex.
>
> On Tue, Feb 9, 2016 at 8:58 PM, Koert Kuipers 
> wrote:
>
>> has anyone successfully connected to hive metastore using spark
>> 1.6.0? i am having no luck. worked fine with spark 1.5.1 for me. i am on
>> cdh 5.5 and launching spark with yarn.
>>
>> this is what i see in logs:
>> 16/02/09 14:49:12 INFO hive.metastore: Trying to connect to metastore
>> with URI thrift://metastore.mycompany.com:9083
>> 16/02/09 14:49:12 INFO hive.metastore: Connected to metastore.
>>
>> and then a little later:
>>
>> 16/02/09 14:49:34 INFO hive.HiveContext: Initializing execution hive,
>> version 1.2.1
>> 16/02/09 14:49:34 INFO client.ClientWrapper: Inspected Hadoop
>> version: 2.6.0-cdh5.4.4
>> 16/02/09 14:49:34 INFO client.ClientWrapper: Loaded
>>