aggregateByKey on PairRDD
Hi All, I have an RDD having the data in the following form : tempRDD: RDD[(String, (String, String))] (brand , (product, key)) ("amazon",("book1","tech")) ("eBay",("book1","tech")) ("barns&noble",("book","tech")) ("amazon",("book2","tech")) I would like to group the data by Brand and would like to get the result set in the following format : resultSetRDD : RDD[(String, List[(String), (String)] i tried using the aggregateByKey but kind of not getting how to achieve this. OR is there any other way to achieve this? val resultSetRDD = tempRDD.aggregateByKey("")({case (aggr , value) => aggr + String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2) resultSetRDD = (amazon,("book1","tech"),("book2","tech")) Thanks, Suniti
Re: Compare a column in two different tables/find the distance between column data
The data in the title is different, so to correct the data in the column requires to find out what is the correct data and then replace. To find the correct data could be tedious but if some mechanism is in place which can help to group the partially matched data then it might help to do the further processing. I am kind of stuck. On Tue, Mar 15, 2016 at 10:50 AM, Suniti Singh wrote: > Is it always the case that one title is a substring of another ? -- Not > always. One title can have values like D.O.C, doctor_{areacode}, > doc_{dep,areacode} > > On Mon, Mar 14, 2016 at 10:39 PM, Wail Alkowaileet > wrote: > >> I think you need some sort of fuzzy join ? >> Is it always the case that one title is a substring of another ? >> >> On Tue, Mar 15, 2016 at 6:46 AM, Suniti Singh >> wrote: >> >>> Hi All, >>> >>> I have two tables with same schema but different data. I have to join >>> the tables based on one column and then do a group by the same column name. >>> >>> now the data in that column in two table might/might not exactly match. >>> (Ex - column name is "title". Table1. title = "doctor" and Table2. title >>> = "doc") doctor and doc are actually same titles. >>> >>> From performance point of view where i have data volume in TB , i am not >>> sure if i can achieve this using the sql statement. What would be the best >>> approach of solving this problem. Should i look for MLLIB apis? >>> >>> Spark Gurus any pointers? >>> >>> Thanks, >>> Suniti >>> >>> >>> >> >> >> -- >> >> *Regards,* >> Wail Alkowaileet >> > >
Re: Compare a column in two different tables/find the distance between column data
Is it always the case that one title is a substring of another ? -- Not always. One title can have values like D.O.C, doctor_{areacode}, doc_{dep,areacode} On Mon, Mar 14, 2016 at 10:39 PM, Wail Alkowaileet wrote: > I think you need some sort of fuzzy join ? > Is it always the case that one title is a substring of another ? > > On Tue, Mar 15, 2016 at 6:46 AM, Suniti Singh > wrote: > >> Hi All, >> >> I have two tables with same schema but different data. I have to join the >> tables based on one column and then do a group by the same column name. >> >> now the data in that column in two table might/might not exactly match. >> (Ex - column name is "title". Table1. title = "doctor" and Table2. title >> = "doc") doctor and doc are actually same titles. >> >> From performance point of view where i have data volume in TB , i am not >> sure if i can achieve this using the sql statement. What would be the best >> approach of solving this problem. Should i look for MLLIB apis? >> >> Spark Gurus any pointers? >> >> Thanks, >> Suniti >> >> >> > > > -- > > *Regards,* > Wail Alkowaileet >
Compare a column in two different tables/find the distance between column data
Hi All, I have two tables with same schema but different data. I have to join the tables based on one column and then do a group by the same column name. now the data in that column in two table might/might not exactly match. (Ex - column name is "title". Table1. title = "doctor" and Table2. title = "doc") doctor and doc are actually same titles. >From performance point of view where i have data volume in TB , i am not sure if i can achieve this using the sql statement. What would be the best approach of solving this problem. Should i look for MLLIB apis? Spark Gurus any pointers? Thanks, Suniti
Re: spark 1.6.0 connect to hive metastore
hive 1.6.0 in embed mode doesn't connect to metastore -- https://issues.apache.org/jira/browse/SPARK-9686 https://forums.databricks.com/questions/6512/spark-160-not-able-to-connect-to-hive-metastore.html On Wed, Mar 9, 2016 at 10:48 AM, Suniti Singh wrote: > Hi, > > I am able to reproduce this error only when using spark 1.6.0 and hive > 1.6.0. The hive-site.xml is in the classpath but somehow spark rejects the > classpath search for hive-site.xml and start using the default metastore > Derby. > > 16/03/09 10:37:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB > is DERBY > > 16/03/09 10:37:52 INFO ObjectStore: Initialized ObjectStore > > 16/03/09 10:37:52 WARN Hive: Failed to access metastore. This class should > not accessed in runtime. > > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > > at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) > > at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > > at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) > > at org.apache.hadoop.hive.ql.session.SessionState.start( > SessionState.java:503) > at org.apache.spark.sql.hive.client.ClientWrapper.( > ClientWrapper.scala:194) > > On Wed, Mar 9, 2016 at 9:00 AM, Dave Maughan > wrote: > >> Hi, >> >> We're having a similar issue. We have a standalone cluster running 1.5.2 >> with Hive working fine having dropped hive-site.xml into the conf folder. >> We've just updated to 1.6.0, using the same configuration. Now when >> starting a spark-shell we get the following: >> >> java.lang.RuntimeException: java.lang.RuntimeException: Unable to >> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreCli >> ent >> at >> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) >> at >> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:194) >> at >> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238) >> at >> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218) >> at >> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208) >> at >> org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462) >> at >> org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461) >> at >> org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:40) >> at org.apache.spark.sql.SQLContext.(SQLContext.scala:330) >> at >> org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) >> at >> org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:422) >> at >> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028) >> at $iwC$$iwC.(:15) >> at $iwC.(:24) >> at (:26) >> >> On stepping though the code and enabling debug it shows that >> hive.metastore.uris is not set: >> >> DEBUG ClientWrapper: Hive Config: hive.metastore.uris= >> >> ..So it looks like it's not finding hive-site.xml? Weirdly, if I remove >> hive-site.xml the exception does not occur which implies that it WAS on the >> classpath... >> >> Dave >> >> >> >> On Tue, 9 Feb 2016 at 22:26 Koert Kuipers wrote: >> >>> i do not have phoenix, but i wonder if its something related. will check >>> my classpaths >>> >>> On Tue, Feb 9, 2016 at 5:00 PM, Benjamin Kim wrote: >>> >>>> I got the same problem when I added the Phoenix plugin jar in the >>>> driver and executor extra classpaths. Do you have those set too? >>>> >>> >>>> On Feb 9, 2016, at 1:12 PM, Koert Kuipers wrote: >>>> >>>> yes its not using derby i think: i can see the tables in my actual hive >>>> metastore. >>>> >>>> i was using a symlink to /etc/hive/conf/hive-site.xml for my >>>> hive-site.xml which
Re: spark 1.6.0 connect to hive metastore
Hi, I am able to reproduce this error only when using spark 1.6.0 and hive 1.6.0. The hive-site.xml is in the classpath but somehow spark rejects the classpath search for hive-site.xml and start using the default metastore Derby. 16/03/09 10:37:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 16/03/09 10:37:52 INFO ObjectStore: Initialized ObjectStore 16/03/09 10:37:52 WARN Hive: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) at org.apache.hadoop.hive.ql.session.SessionState.start( SessionState.java:503) at org.apache.spark.sql.hive.client.ClientWrapper.( ClientWrapper.scala:194) On Wed, Mar 9, 2016 at 9:00 AM, Dave Maughan wrote: > Hi, > > We're having a similar issue. We have a standalone cluster running 1.5.2 > with Hive working fine having dropped hive-site.xml into the conf folder. > We've just updated to 1.6.0, using the same configuration. Now when > starting a spark-shell we get the following: > > java.lang.RuntimeException: java.lang.RuntimeException: Unable to > instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreCli > ent > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:194) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208) > at > org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462) > at > org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461) > at > org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:40) > at org.apache.spark.sql.SQLContext.(SQLContext.scala:330) > at > org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) > at > org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028) > at $iwC$$iwC.(:15) > at $iwC.(:24) > at (:26) > > On stepping though the code and enabling debug it shows that > hive.metastore.uris is not set: > > DEBUG ClientWrapper: Hive Config: hive.metastore.uris= > > ..So it looks like it's not finding hive-site.xml? Weirdly, if I remove > hive-site.xml the exception does not occur which implies that it WAS on the > classpath... > > Dave > > > > On Tue, 9 Feb 2016 at 22:26 Koert Kuipers wrote: > >> i do not have phoenix, but i wonder if its something related. will check >> my classpaths >> >> On Tue, Feb 9, 2016 at 5:00 PM, Benjamin Kim wrote: >> >>> I got the same problem when I added the Phoenix plugin jar in the driver >>> and executor extra classpaths. Do you have those set too? >>> >> >>> On Feb 9, 2016, at 1:12 PM, Koert Kuipers wrote: >>> >>> yes its not using derby i think: i can see the tables in my actual hive >>> metastore. >>> >>> i was using a symlink to /etc/hive/conf/hive-site.xml for my >>> hive-site.xml which has a lot more stuff than just hive.metastore.uris >>> >>> let me try your approach >>> >>> >>> >>> On Tue, Feb 9, 2016 at 3:57 PM, Alexandr Dzhagriev >>> wrote: >>> I'm using spark 1.6.0, hive 1.2.1 and there is just one property in the hive-site.xml hive.metastore.uris Works for me. Can you check in the logs, that when the HiveContext is created it connects to the correct uri and doesn't use derby. Cheers, Alex. On Tue, Feb 9, 2016 at 9:39 PM, Koert Kuipers wrote: > hey thanks. hive-site is on classpath in conf directory > > i currently got it to work by changing this hive setting in > hive-site.xml: > hive.metastore.schema.verification=true > to > hive.metastore.schema.verification=false > > this feels like a hack, because schema verification is a good thing i > would assume? > > On Tue, Feb 9, 2016 at 3:25 PM, Alexandr Dzhagriev > wrote: > >> Hi Koert, >> >> As far as I can see you are using derby: >>
Re: Using dynamic allocation and shuffle service in Standalone Mode
Please check the document for the configuration - http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup On Tue, Mar 8, 2016 at 10:14 AM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > You’ve started the external shuffle service on all worker nodes, correct? > Can you confirm they’re still running and haven’t exited? > > > > > > > > *From: *Yuval.Itzchakov > *Sent: *Tuesday, March 8, 2016 12:41 PM > *To: *user@spark.apache.org > *Subject: *Using dynamic allocation and shuffle service in Standalone Mode > > > Hi, > I'm using Spark 1.6.0, and according to the documentation, dynamic > allocation and spark shuffle service should be enabled. > > When I submit a spark job via the following: > > spark-submit \ > --master \ > --deploy-mode cluster \ > --executor-cores 3 \ > --conf "spark.streaming.backpressure.enabled=true" \ > --conf "spark.dynamicAllocation.enabled=true" \ > --conf "spark.dynamicAllocation.minExecutors=2" \ > --conf "spark.dynamicAllocation.maxExecutors=24" \ > --conf "spark.shuffle.service.enabled=true" \ > --conf "spark.executor.memory=8g" \ > --conf "spark.driver.memory=10g" \ > --class SparkJobRunner > > /opt/clicktale/entityCreator/com.clicktale.ai.entity-creator-assembly-0.0.2.jar > > I'm seeing error logs from the workers being unable to connect to the > shuffle service: > > 16/03/08 17:33:15 ERROR storage.BlockManager: Failed to connect to external > shuffle server, will retry 2 more times after waiting 5 seconds... > java.io.IOException: Failed to connect to > at > > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) > at > > org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:181) > at > > org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:141) > at > > org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:211) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at > > org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:208) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:194) > at org.apache.spark.executor.Executor.(Executor.scala:85) > at > > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83) > at > > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > I verified all relevant ports are open. Has anyone else experienced such a > failure? > > Yuval. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Using-dynamic-allocation-and-shuffle-service-in-Standalone-Mode-tp26430.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Adding hive context gives error
yeah i realized it and changed the version of it to 1.6.0 as mentioned in http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/1.6.0 I added the spark sql dependency back to the pom.xml and the scala code works just fine. On Mon, Mar 7, 2016 at 5:00 PM, Tristan Nixon wrote: > Hi Suniti, > > why are you mixing spark-sql version 1.2.0 with spark-core, spark-hive v > 1.6.0? > > I’d suggest you try to keep all the libs at the same version. > > On Mar 7, 2016, at 6:15 PM, Suniti Singh wrote: > > > > org.apache.spark > > spark-core_2.10 > > 1.6.0 > > > > > > org.apache.spark > > spark-sql_2.10 > > 1.2.0 > > > > > > org.apache.spark > > spark-hive_2.10 > > 1.6.0 > > > > >
Re: Adding hive context gives error
We do not need to add the external jars to eclipse if maven is used as a Build tool since the spark dependency in POM file will take care of it. On Mon, Mar 7, 2016 at 4:50 PM, Mich Talebzadeh wrote: > Hi Kabeer, > > I have not used eclipse for Spark/Scala although I have played with it. > > As a matter of interest when you set up an Eclipse project do you add > external Jars to eclipse From $SPARK_HOME/lib only? > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 8 March 2016 at 00:45, Suniti Singh wrote: > >> Thanks Mich and Kabeer for quick reply. >> >> @ Kabeer - i removed the spark - sql dependency and all the errors are >> gone. But i am surprised to see this behaviour. Why spark-sql lib are an >> issue for including the hive context? >> >> Regards, >> Suniti >> >> On Mon, Mar 7, 2016 at 4:34 PM, Kabeer Ahmed >> wrote: >> >>> I use SBT and I have never included spark-sql. The simple 2 lines in SBT >>> are as below: >>> >>> >>> libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.5.0", >>> "org.apache.spark" %% "spark-hive" % "1.5.0") >>> >>> >>> >>> However, I do note that you are using Spark-sql include and the Spark >>> version you use is 1.6.0. Can you please try with 1.5.0 to see if it works? >>> I havent yet tried Spark 1.6.0. >>> >>> >>> On 08/03/16 00:15, Suniti Singh wrote: >>> >>> Hi All, >>> >>> I am trying to create a hive context in a scala prog as follows in >>> eclipse: >>> Note -- i have added the maven dependency for spark -core , hive , and >>> sql. >>> >>> import org.apache.spark.SparkConf >>> >>> import org.apache.spark.SparkContext >>> >>> import org.apache.spark.rdd.RDD.rddToPairRDDFunctions >>> >>> object DataExp { >>> >>>def main(args: Array[String]) = { >>> >>> val conf = new SparkConf().setAppName("DataExp").setMaster("local" >>> ) >>> >>> val sc = new SparkContext(conf) >>> >>> * val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)* >>> >>> >>> } >>> >>> } >>> >>> I get the the following *errors*: @ line of hiveContext above in the >>> prog >>> >>> 1 --- Error in Scala compiler: bad symbolic reference. A signature in >>> HiveContext.class refers to term ui in package >>> org.apache.spark.sql.execution which is not available. It may be completely >>> missing from the current classpath, or the version on the classpath might >>> be incompatible with the version used when compiling HiveContext.class. >>> spark Unknown Scala Problem >>> 2 --- SBT builder crashed while compiling. The error message is 'bad >>> symbolic reference. A signature in HiveContext.class refers to term ui in >>> package org.apache.spark.sql.execution which is not available. It may be >>> completely missing from the current classpath, or the version on the >>> classpath might be incompatible with the version used when compiling >>> HiveContext.class.'. Check Error Log for details. spark Unknown Scala >>> Problem >>> >>> 3 --- while compiling: >>> /Users/sunitisingh/sparktest/spark/src/main/scala/com/sparktest/spark/DataExp.scala >>> during phase: erasure library version: version 2.10.6 >>> compiler version: version 2.10.6 reconstructed args: -javabootclasspath >>> /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home
Re: Adding hive context gives error
Thanks Mich and Kabeer for quick reply. @ Kabeer - i removed the spark - sql dependency and all the errors are gone. But i am surprised to see this behaviour. Why spark-sql lib are an issue for including the hive context? Regards, Suniti On Mon, Mar 7, 2016 at 4:34 PM, Kabeer Ahmed wrote: > I use SBT and I have never included spark-sql. The simple 2 lines in SBT > are as below: > > > libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.5.0", > "org.apache.spark" %% "spark-hive" % "1.5.0") > > > > However, I do note that you are using Spark-sql include and the Spark > version you use is 1.6.0. Can you please try with 1.5.0 to see if it works? > I havent yet tried Spark 1.6.0. > > > On 08/03/16 00:15, Suniti Singh wrote: > > Hi All, > > I am trying to create a hive context in a scala prog as follows in eclipse: > Note -- i have added the maven dependency for spark -core , hive , and > sql. > > import org.apache.spark.SparkConf > > import org.apache.spark.SparkContext > > import org.apache.spark.rdd.RDD.rddToPairRDDFunctions > > object DataExp { > >def main(args: Array[String]) = { > > val conf = new SparkConf().setAppName("DataExp").setMaster("local") > > val sc = new SparkContext(conf) > > * val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)* > > > } > > } > > I get the the following *errors*: @ line of hiveContext above in the prog > > 1 --- Error in Scala compiler: bad symbolic reference. A signature in > HiveContext.class refers to term ui in package > org.apache.spark.sql.execution which is not available. It may be completely > missing from the current classpath, or the version on the classpath might > be incompatible with the version used when compiling HiveContext.class. > spark Unknown Scala Problem > 2 --- SBT builder crashed while compiling. The error message is 'bad > symbolic reference. A signature in HiveContext.class refers to term ui in > package org.apache.spark.sql.execution which is not available. It may be > completely missing from the current classpath, or the version on the > classpath might be incompatible with the version used when compiling > HiveContext.class.'. Check Error Log for details. spark Unknown Scala > Problem > > 3 --- while compiling: > /Users/sunitisingh/sparktest/spark/src/main/scala/com/sparktest/spark/DataExp.scala > during phase: erasure library version: version 2.10.6 > compiler version: version 2.10.6 reconstructed args: -javabootclasspath > /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60. > jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/System/Library/Java/Extensions/AppleScriptEngine.jar:/System/Library/Java/Extensions/dns_sd.jar:/System/Library/Java/Extensions/j3daudio.jar:/System/Library/Java/Extensions/j3dcore.jar:/System/Library/Java/Extensions/j3dutils.jar:/System/Library/Java/Extensions/jai_codec.jar:/System/Library/Java/Extensions/jai_core.jar:/System/Library/Java/Extensions/mlibwrapper_jai.jar:/System/Library/Java/Extensions/MRJToolkit.jar:/System/Library/Java/Extensions/vecmath.jar > -classpath > /Users/sunitisingh/sparktest/spark/target/classes:/Users/sunitisingh/sparktest/spark/target/test-classes:/Users/sunitisingh/.m2/repository/org/apache/spark/spark-core_2.10/1.6.0/spark-core_2.10-1.6.0.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/Users/sunitisingh/.m2/repository/o
Adding hive context gives error
Hi All, I am trying to create a hive context in a scala prog as follows in eclipse: Note -- i have added the maven dependency for spark -core , hive , and sql. import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD.rddToPairRDDFunctions object DataExp { def main(args: Array[String]) = { val conf = new SparkConf().setAppName("DataExp").setMaster("local") val sc = new SparkContext(conf) * val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)* } } I get the the following *errors*: @ line of hiveContext above in the prog 1 --- Error in Scala compiler: bad symbolic reference. A signature in HiveContext.class refers to term ui in package org.apache.spark.sql.execution which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class. spark Unknown Scala Problem 2 --- SBT builder crashed while compiling. The error message is 'bad symbolic reference. A signature in HiveContext.class refers to term ui in package org.apache.spark.sql.execution which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class.'. Check Error Log for details. spark Unknown Scala Problem 3 --- while compiling: /Users/sunitisingh/sparktest/spark/src/main/scala/com/sparktest/spark/DataExp.scala during phase: erasure library version: version 2.10.6 compiler version: version 2.10.6 reconstructed args: -javabootclasspath /Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/System/Library/Java/Extensions/AppleScriptEngine.jar:/System/Library/Java/Extensions/dns_sd.jar:/System/Library/Java/Extensions/j3daudio.jar:/System/Library/Java/Extensions/j3dcore.jar:/System/Library/Java/Extensions/j3dutils.jar:/System/Library/Java/Extensions/jai_codec.jar:/System/Library/Java/Extensions/jai_core.jar:/System/Library/Java/Extensions/mlibwrapper_jai.jar:/System/Library/Java/Extensions/MRJToolkit.jar:/System/Library/Java/Extensions/vecmath.jar -classpath /Users/sunitisingh/sparktest/spark/target/classes:/Users/sunitisingh/sparktest/spark/target/test-classes:/Users/sunitisingh/.m2/repository/org/apache/spark/spark-core_2.10/1.6.0/spark-core_2.10-1.6.0.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-mapred/1.7.7/avro-mapred-1.7.7-hadoop2.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7.jar:/Users/sunitisingh/.m2/repository/org/apache/avro/avro-ipc/1.7.7/avro-ipc-1.7.7-tests.jar:/Users/sunitisingh/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/Users/sunitisingh/.m2/repository/com/twitter/chill_2.10/0.5.0/chill_2.10-0.5.0.jar:/Users/sunitisingh/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/Users/sunitisingh/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/Users/sunitisingh/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/Users/sunitisingh/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/Users/sunitisingh/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/Users/sunitisingh/.m2/repository/org/apache/xbean/xbean-asm5-shaded/4.4/xbean-asm5-shaded-4.4.jar:/Users/sunitisingh/.m2/repository/org/apache/hadoop/hadoop-client/2.2.0/hadoop-client-2.2.0.jar:/Users/sunitisingh/.m2/repository/org/apache/hadoop/hadoop-common/2.2.0/hadoop-common-2.2.0.jar:/Users/sunitisingh/.m2/repository/org/apache/commons/commons-math/2.1/commons-m