aggregateByKey on PairRDD

2016-03-29 Thread Suniti Singh
Hi All,

I have an RDD having the data in  the following form :

tempRDD: RDD[(String, (String, String))]

(brand , (product, key))

("amazon",("book1","tech"))

("eBay",("book1","tech"))

("barns&noble",("book","tech"))

("amazon",("book2","tech"))


I would like to group the data by Brand and would like to get the result
set in the following format :

resultSetRDD : RDD[(String, List[(String), (String)]

i tried using the aggregateByKey but kind  of not getting how to achieve
this. OR is there any other way to achieve this?

val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) => aggr
+ String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)

resultSetRDD = (amazon,("book1","tech"),("book2","tech"))

Thanks,

Suniti


Re: Compare a column in two different tables/find the distance between column data

2016-03-15 Thread Suniti Singh
The data in the title is different, so to correct the data in the column
requires to find out what is the correct data  and then replace.

To find the correct data could be tedious but if some mechanism is in place
which can help to group the partially matched data then it might help to do
the further processing.

I am kind of stuck.



On Tue, Mar 15, 2016 at 10:50 AM, Suniti Singh 
wrote:

> Is it always the case that one title is a substring of another ? -- Not
> always. One title can have values like D.O.C, doctor_{areacode},
> doc_{dep,areacode}
>
> On Mon, Mar 14, 2016 at 10:39 PM, Wail Alkowaileet 
> wrote:
>
>> I think you need some sort of fuzzy join ?
>> Is it always the case that one title is a substring of another ?
>>
>> On Tue, Mar 15, 2016 at 6:46 AM, Suniti Singh 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have two tables with same schema but different data. I have to join
>>> the tables based on one column and then do a group by the same column name.
>>>
>>> now the data in that column in two table might/might not exactly match.
>>> (Ex - column name is "title". Table1. title = "doctor"   and Table2. title
>>> = "doc") doctor and doc are actually same titles.
>>>
>>> From performance point of view where i have data volume in TB , i am not
>>> sure if i can achieve this using the sql statement. What would be the best
>>> approach of solving this problem. Should i look for MLLIB apis?
>>>
>>> Spark Gurus any pointers?
>>>
>>> Thanks,
>>> Suniti
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Regards,*
>> Wail Alkowaileet
>>
>
>


Re: Compare a column in two different tables/find the distance between column data

2016-03-15 Thread Suniti Singh
Is it always the case that one title is a substring of another ? -- Not
always. One title can have values like D.O.C, doctor_{areacode},
doc_{dep,areacode}

On Mon, Mar 14, 2016 at 10:39 PM, Wail Alkowaileet 
wrote:

> I think you need some sort of fuzzy join ?
> Is it always the case that one title is a substring of another ?
>
> On Tue, Mar 15, 2016 at 6:46 AM, Suniti Singh 
> wrote:
>
>> Hi All,
>>
>> I have two tables with same schema but different data. I have to join the
>> tables based on one column and then do a group by the same column name.
>>
>> now the data in that column in two table might/might not exactly match.
>> (Ex - column name is "title". Table1. title = "doctor"   and Table2. title
>> = "doc") doctor and doc are actually same titles.
>>
>> From performance point of view where i have data volume in TB , i am not
>> sure if i can achieve this using the sql statement. What would be the best
>> approach of solving this problem. Should i look for MLLIB apis?
>>
>> Spark Gurus any pointers?
>>
>> Thanks,
>> Suniti
>>
>>
>>
>
>
> --
>
> *Regards,*
> Wail Alkowaileet
>


Compare a column in two different tables/find the distance between column data

2016-03-14 Thread Suniti Singh
Hi All,

I have two tables with same schema but different data. I have to join the
tables based on one column and then do a group by the same column name.

now the data in that column in two table might/might not exactly match. (Ex
- column name is "title". Table1. title = "doctor"   and Table2. title =
"doc") doctor and doc are actually same titles.

>From performance point of view where i have data volume in TB , i am not
sure if i can achieve this using the sql statement. What would be the best
approach of solving this problem. Should i look for MLLIB apis?

Spark Gurus any pointers?

Thanks,
Suniti


Re: spark 1.6.0 connect to hive metastore

2016-03-09 Thread Suniti Singh
hive 1.6.0 in embed mode doesn't connect to metastore --
https://issues.apache.org/jira/browse/SPARK-9686

https://forums.databricks.com/questions/6512/spark-160-not-able-to-connect-to-hive-metastore.html


On Wed, Mar 9, 2016 at 10:48 AM, Suniti Singh 
wrote:

> Hi,
>
> I am able to reproduce this error only when using spark 1.6.0 and hive
> 1.6.0. The hive-site.xml is in the classpath but somehow spark rejects the
> classpath search for hive-site.xml and start using the default metastore
> Derby.
>
> 16/03/09 10:37:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB
> is DERBY
>
> 16/03/09 10:37:52 INFO ObjectStore: Initialized ObjectStore
>
> 16/03/09 10:37:52 WARN Hive: Failed to access metastore. This class should
> not accessed in runtime.
>
> org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>
> at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
>
> at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:503)
> at org.apache.spark.sql.hive.client.ClientWrapper.(
> ClientWrapper.scala:194)
>
> On Wed, Mar 9, 2016 at 9:00 AM, Dave Maughan 
> wrote:
>
>> Hi,
>>
>> We're having a similar issue. We have a standalone cluster running 1.5.2
>> with Hive working fine having dropped hive-site.xml into the conf folder.
>> We've just updated to 1.6.0, using the same configuration. Now when
>> starting a spark-shell we get the following:
>>
>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreCli
>> ent
>> at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>> at
>> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:194)
>> at
>> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
>> at
>> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
>> at
>> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
>> at
>> org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
>> at
>> org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
>> at
>> org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:40)
>> at org.apache.spark.sql.SQLContext.(SQLContext.scala:330)
>> at
>> org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
>> at
>> org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>> at
>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
>> at $iwC$$iwC.(:15)
>> at $iwC.(:24)
>> at (:26)
>>
>> On stepping though the code and enabling debug it shows that
>> hive.metastore.uris is not set:
>>
>> DEBUG ClientWrapper: Hive Config: hive.metastore.uris=
>>
>> ..So it looks like it's not finding hive-site.xml? Weirdly, if I remove
>> hive-site.xml the exception does not occur which implies that it WAS on the
>> classpath...
>>
>> Dave
>>
>>
>>
>> On Tue, 9 Feb 2016 at 22:26 Koert Kuipers  wrote:
>>
>>> i do not have phoenix, but i wonder if its something related. will check
>>> my classpaths
>>>
>>> On Tue, Feb 9, 2016 at 5:00 PM, Benjamin Kim  wrote:
>>>
>>>> I got the same problem when I added the Phoenix plugin jar in the
>>>> driver and executor extra classpaths. Do you have those set too?
>>>>
>>>
>>>> On Feb 9, 2016, at 1:12 PM, Koert Kuipers  wrote:
>>>>
>>>> yes its not using derby i think: i can see the tables in my actual hive
>>>> metastore.
>>>>
>>>> i was using a symlink to /etc/hive/conf/hive-site.xml for my
>>>> hive-site.xml which 

Re: spark 1.6.0 connect to hive metastore

2016-03-09 Thread Suniti Singh
Hi,

I am able to reproduce this error only when using spark 1.6.0 and hive
1.6.0. The hive-site.xml is in the classpath but somehow spark rejects the
classpath search for hive-site.xml and start using the default metastore
Derby.

16/03/09 10:37:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB
is DERBY

16/03/09 10:37:52 INFO ObjectStore: Initialized ObjectStore

16/03/09 10:37:52 WARN Hive: Failed to access metastore. This class should
not accessed in runtime.

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)

at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)

at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)

at org.apache.hadoop.hive.ql.session.SessionState.start(
SessionState.java:503)
at org.apache.spark.sql.hive.client.ClientWrapper.(
ClientWrapper.scala:194)

On Wed, Mar 9, 2016 at 9:00 AM, Dave Maughan 
wrote:

> Hi,
>
> We're having a similar issue. We have a standalone cluster running 1.5.2
> with Hive working fine having dropped hive-site.xml into the conf folder.
> We've just updated to 1.6.0, using the same configuration. Now when
> starting a spark-shell we get the following:
>
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreCli
> ent
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
> at
> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:194)
> at
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
> at
> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
> at
> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
> at
> org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
> at
> org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
> at
> org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:40)
> at org.apache.spark.sql.SQLContext.(SQLContext.scala:330)
> at
> org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90)
> at
> org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at
> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
> at $iwC$$iwC.(:15)
> at $iwC.(:24)
> at (:26)
>
> On stepping though the code and enabling debug it shows that
> hive.metastore.uris is not set:
>
> DEBUG ClientWrapper: Hive Config: hive.metastore.uris=
>
> ..So it looks like it's not finding hive-site.xml? Weirdly, if I remove
> hive-site.xml the exception does not occur which implies that it WAS on the
> classpath...
>
> Dave
>
>
>
> On Tue, 9 Feb 2016 at 22:26 Koert Kuipers  wrote:
>
>> i do not have phoenix, but i wonder if its something related. will check
>> my classpaths
>>
>> On Tue, Feb 9, 2016 at 5:00 PM, Benjamin Kim  wrote:
>>
>>> I got the same problem when I added the Phoenix plugin jar in the driver
>>> and executor extra classpaths. Do you have those set too?
>>>
>>
>>> On Feb 9, 2016, at 1:12 PM, Koert Kuipers  wrote:
>>>
>>> yes its not using derby i think: i can see the tables in my actual hive
>>> metastore.
>>>
>>> i was using a symlink to /etc/hive/conf/hive-site.xml for my
>>> hive-site.xml which has a lot more stuff than just hive.metastore.uris
>>>
>>> let me try your approach
>>>
>>>
>>>
>>> On Tue, Feb 9, 2016 at 3:57 PM, Alexandr Dzhagriev 
>>> wrote:
>>>
 I'm using spark 1.6.0, hive 1.2.1 and there is just one property in the
 hive-site.xml hive.metastore.uris Works for me. Can you check in the
 logs, that when the HiveContext is created it connects to the correct uri
 and doesn't use derby.

 Cheers, Alex.

 On Tue, Feb 9, 2016 at 9:39 PM, Koert Kuipers 
 wrote:

> hey thanks. hive-site is on classpath in conf directory
>
> i currently got it to work by changing this hive setting in
> hive-site.xml:
> hive.metastore.schema.verification=true
> to
> hive.metastore.schema.verification=false
>
> this feels like a hack, because schema verification is a good thing i
> would assume?
>
> On Tue, Feb 9, 2016 at 3:25 PM, Alexandr Dzhagriev 
> wrote:
>
>> Hi Koert,
>>
>> As far as I can see you are using derby:
>>

Re: Using dynamic allocation and shuffle service in Standalone Mode

2016-03-08 Thread Suniti Singh
Please check the document for the configuration -
http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup


On Tue, Mar 8, 2016 at 10:14 AM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

> You’ve started the external shuffle service on all worker nodes, correct?
> Can you confirm they’re still running and haven’t exited?
>
>
>
>
>
>
>
> *From: *Yuval.Itzchakov 
> *Sent: *Tuesday, March 8, 2016 12:41 PM
> *To: *user@spark.apache.org
> *Subject: *Using dynamic allocation and shuffle service in Standalone Mode
>
>
> Hi,
> I'm using Spark 1.6.0, and according to the documentation, dynamic
> allocation and spark shuffle service should be enabled.
>
> When I submit a spark job via the following:
>
> spark-submit \
> --master  \
> --deploy-mode cluster \
> --executor-cores 3 \
> --conf "spark.streaming.backpressure.enabled=true" \
> --conf "spark.dynamicAllocation.enabled=true" \
> --conf "spark.dynamicAllocation.minExecutors=2" \
> --conf "spark.dynamicAllocation.maxExecutors=24" \
> --conf "spark.shuffle.service.enabled=true" \
> --conf "spark.executor.memory=8g" \
> --conf "spark.driver.memory=10g" \
> --class SparkJobRunner
>
> /opt/clicktale/entityCreator/com.clicktale.ai.entity-creator-assembly-0.0.2.jar
>
> I'm seeing error logs from the workers being unable to connect to the
> shuffle service:
>
> 16/03/08 17:33:15 ERROR storage.BlockManager: Failed to connect to external
> shuffle server, will retry 2 more times after waiting 5 seconds...
> java.io.IOException: Failed to connect to 
> at
>
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
> at
>
> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181)
> at
>
> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141)
> at
>
> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:211)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> at
>
> org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:208)
> at
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194)
> at org.apache.spark.executor.Executor.(Executor.scala:85)
> at
>
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
> at
>
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I verified all relevant ports are open. Has anyone else experienced such a
> failure?
>
> Yuval.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-dynamic-allocation-and-shuffle-service-in-Standalone-Mode-tp26430.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Adding hive context gives error

2016-03-07 Thread Suniti Singh
yeah i realized it and changed the version of it to 1.6.0 as mentioned in
http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/1.6.0

I added the spark sql dependency back to the pom.xml and the scala code
works just fine.



On Mon, Mar 7, 2016 at 5:00 PM, Tristan Nixon 
wrote:

> Hi Suniti,
>
> why are you mixing spark-sql version 1.2.0 with spark-core, spark-hive v
> 1.6.0?
>
> I’d suggest you try to keep all the libs at the same version.
>
> On Mar 7, 2016, at 6:15 PM, Suniti Singh  wrote:
>
> 
>
>   org.apache.spark
>
>   spark-core_2.10
>
>   1.6.0
>
>   
>
>   
>
>   org.apache.spark
>
>   spark-sql_2.10
>
>   1.2.0
>
>   
>
>   
>
>   org.apache.spark
>
>   spark-hive_2.10
>
>   1.6.0
>
>   
>
>
>


Re: Adding hive context gives error

2016-03-07 Thread Suniti Singh
We do not need to add the external jars to eclipse if maven is used as a
Build tool since the spark dependency in POM file will take care of it.



On Mon, Mar 7, 2016 at 4:50 PM, Mich Talebzadeh 
wrote:

> Hi Kabeer,
>
> I have not used eclipse for Spark/Scala although I have played with it.
>
> As a matter of interest when you set up an Eclipse project do you add
> external Jars to eclipse From $SPARK_HOME/lib only?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 8 March 2016 at 00:45, Suniti Singh  wrote:
>
>> Thanks Mich and Kabeer for quick reply.
>>
>> @ Kabeer - i removed the spark - sql dependency and all the errors are
>> gone. But i am surprised to see this behaviour. Why spark-sql lib are an
>> issue for including the hive context?
>>
>> Regards,
>> Suniti
>>
>> On Mon, Mar 7, 2016 at 4:34 PM, Kabeer Ahmed 
>> wrote:
>>
>>> I use SBT and I have never included spark-sql. The simple 2 lines in SBT
>>> are as below:
>>>
>>>
>>> libraryDependencies ++= Seq(  "org.apache.spark" %% "spark-core" % "1.5.0", 
>>>  "org.apache.spark" %% "spark-hive" % "1.5.0")
>>>
>>>
>>>
>>> However, I do note that you are using Spark-sql include and the Spark
>>> version you use is 1.6.0. Can you please try with 1.5.0 to see if it works?
>>> I havent yet tried Spark 1.6.0.
>>>
>>>
>>> On 08/03/16 00:15, Suniti Singh wrote:
>>>
>>> Hi All,
>>>
>>> I am trying to create a hive context in a scala prog as follows in
>>> eclipse:
>>> Note --  i have added the maven dependency for spark -core , hive , and
>>> sql.
>>>
>>> import org.apache.spark.SparkConf
>>>
>>> import org.apache.spark.SparkContext
>>>
>>> import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
>>>
>>> object DataExp {
>>>
>>>def main(args: Array[String]) = {
>>>
>>>   val conf = new SparkConf().setAppName("DataExp").setMaster("local"
>>> )
>>>
>>>   val sc = new SparkContext(conf)
>>>
>>>  * val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*
>>>
>>>
>>>  }
>>>
>>> }
>>>
>>> I get the the following *errors*: @ line of hiveContext above in the
>>> prog
>>>
>>> 1 --- Error in Scala compiler: bad symbolic reference. A signature in
>>> HiveContext.class refers to term ui in package
>>> org.apache.spark.sql.execution which is not available. It may be completely
>>> missing from the current classpath, or the version on the classpath might
>>> be incompatible with the version used when compiling HiveContext.class.
>>> spark Unknown Scala Problem
>>> 2 --- SBT builder crashed while compiling. The error message is 'bad
>>> symbolic reference. A signature in HiveContext.class refers to term ui in
>>> package org.apache.spark.sql.execution which is not available. It may be
>>> completely missing from the current classpath, or the version on the
>>> classpath might be incompatible with the version used when compiling
>>> HiveContext.class.'. Check Error Log for details. spark Unknown Scala
>>> Problem
>>>
>>> 3 --- while compiling:
>>> /Users/sunitisingh/sparktest/spark/src/main/scala/com/sparktest/spark/DataExp.scala
>>> during phase: erasure  library version: version 2.10.6
>>> compiler version: version 2.10.6   reconstructed args: -javabootclasspath
>>> /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home

Re: Adding hive context gives error

2016-03-07 Thread Suniti Singh
Thanks Mich and Kabeer for quick reply.

@ Kabeer - i removed the spark - sql dependency and all the errors are
gone. But i am surprised to see this behaviour. Why spark-sql lib are an
issue for including the hive context?

Regards,
Suniti

On Mon, Mar 7, 2016 at 4:34 PM, Kabeer Ahmed 
wrote:

> I use SBT and I have never included spark-sql. The simple 2 lines in SBT
> are as below:
>
>
> libraryDependencies ++= Seq(  "org.apache.spark" %% "spark-core" % "1.5.0",  
> "org.apache.spark" %% "spark-hive" % "1.5.0")
>
>
>
> However, I do note that you are using Spark-sql include and the Spark
> version you use is 1.6.0. Can you please try with 1.5.0 to see if it works?
> I havent yet tried Spark 1.6.0.
>
>
> On 08/03/16 00:15, Suniti Singh wrote:
>
> Hi All,
>
> I am trying to create a hive context in a scala prog as follows in eclipse:
> Note --  i have added the maven dependency for spark -core , hive , and
> sql.
>
> import org.apache.spark.SparkConf
>
> import org.apache.spark.SparkContext
>
> import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
>
> object DataExp {
>
>def main(args: Array[String]) = {
>
>   val conf = new SparkConf().setAppName("DataExp").setMaster("local")
>
>   val sc = new SparkContext(conf)
>
>  * val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*
>
>
>  }
>
> }
>
> I get the the following *errors*: @ line of hiveContext above in the prog
>
> 1 --- Error in Scala compiler: bad symbolic reference. A signature in
> HiveContext.class refers to term ui in package
> org.apache.spark.sql.execution which is not available. It may be completely
> missing from the current classpath, or the version on the classpath might
> be incompatible with the version used when compiling HiveContext.class.
> spark Unknown Scala Problem
> 2 --- SBT builder crashed while compiling. The error message is 'bad
> symbolic reference. A signature in HiveContext.class refers to term ui in
> package org.apache.spark.sql.execution which is not available. It may be
> completely missing from the current classpath, or the version on the
> classpath might be incompatible with the version used when compiling
> HiveContext.class.'. Check Error Log for details. spark Unknown Scala
> Problem
>
> 3 --- while compiling:
> /Users/sunitisingh/sparktest/spark/src/main/scala/com/sparktest/spark/DataExp.scala
> during phase: erasure  library version: version 2.10.6
> compiler version: version 2.10.6   reconstructed args: -javabootclasspath
> /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.
> jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/System/Library/Java/Extensions/AppleScriptEngine.jar:/System/Library/Java/Extensions/dns_sd.jar:/System/Library/Java/Extensions/j3daudio.jar:/System/Library/Java/Extensions/j3dcore.jar:/System/Library/Java/Extensions/j3dutils.jar:/System/Library/Java/Extensions/jai_codec.jar:/System/Library/Java/Extensions/jai_core.jar:/System/Library/Java/Extensions/mlibwrapper_jai.jar:/System/Library/Java/Extensions/MRJToolkit.jar:/System/Library/Java/Extensions/vecmath.jar
> -classpath
> /Users/sunitisingh/sparktest/spark/target/classes:/Users/sunitisingh/sparktest/spark/target/test-classes:/Users/sunitisingh/.m2/repository/org/apache/spark/spark-core_2.10/1.6.0/spark-core_2.10-1.6.0.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/Users/sunitisingh/.m2/repository/o

Adding hive context gives error

2016-03-07 Thread Suniti Singh
Hi All,

I am trying to create a hive context in a scala prog as follows in eclipse:
Note --  i have added the maven dependency for spark -core , hive , and sql.

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.rdd.RDD.rddToPairRDDFunctions

object DataExp {

   def main(args: Array[String]) = {

  val conf = new SparkConf().setAppName("DataExp").setMaster("local")

  val sc = new SparkContext(conf)

 * val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*

 }

}

I get the the following *errors*: @ line of hiveContext above in the prog

1 --- Error in Scala compiler: bad symbolic reference. A signature in
HiveContext.class refers to term ui in package
org.apache.spark.sql.execution which is not available. It may be completely
missing from the current classpath, or the version on the classpath might
be incompatible with the version used when compiling HiveContext.class.
spark Unknown Scala Problem
2 --- SBT builder crashed while compiling. The error message is 'bad
symbolic reference. A signature in HiveContext.class refers to term ui in
package org.apache.spark.sql.execution which is not available. It may be
completely missing from the current classpath, or the version on the
classpath might be incompatible with the version used when compiling
HiveContext.class.'. Check Error Log for details. spark Unknown Scala
Problem

3 --- while compiling:
/Users/sunitisingh/sparktest/spark/src/main/scala/com/sparktest/spark/DataExp.scala
during phase: erasure  library version: version 2.10.6
compiler version: version 2.10.6   reconstructed args: -javabootclasspath
/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/System/Library/Java/Extensions/AppleScriptEngine.jar:/System/Library/Java/Extensions/dns_sd.jar:/System/Library/Java/Extensions/j3daudio.jar:/System/Library/Java/Extensions/j3dcore.jar:/System/Library/Java/Extensions/j3dutils.jar:/System/Library/Java/Extensions/jai_codec.jar:/System/Library/Java/Extensions/jai_core.jar:/System/Library/Java/Extensions/mlibwrapper_jai.jar:/System/Library/Java/Extensions/MRJToolkit.jar:/System/Library/Java/Extensions/vecmath.jar
-classpath
/Users/sunitisingh/sparktest/spark/target/classes:/Users/sunitisingh/sparktest/spark/target/test-classes:/Users/sunitisingh/.m2/repository/org/apache/spark/spark-core_2.10/1.6.0/spark-core_2.10-1.6.0.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/Users/sunitisingh/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/Users/sunitisingh/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/Users/sunitisingh/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/Users/sunitisingh/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/Users/sunitisingh/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/Users/sunitisingh/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/Users/sunitisingh/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/Users/sunitisingh/.m2/repository/org/apache/xbean/xbean-asm5-shaded/4.4/xbean-asm5-shaded-4.4.jar:/Users/sunitisingh/.m2/repository/org/apache/hadoop/hadoop-client/2.2.0/hadoop-client-2.2.0.jar:/Users/sunitisingh/.m2/repository/org/apache/hadoop/hadoop-common/2.2.0/hadoop-common-2.2.0.jar:/Users/sunitisingh/.m2/repository/org/apache/commons/commons-math/2.1/commons-m