Oh ok Denny, great!!! Also, thanks for your effort in resolving my issue. can I ask one more (more open ended) question? We have a requirement where we want to read data from either Blob storage or Hive table, and upsert few records in CosmosDB.
One option is run a C# activity on a windows batch pool. Other option is to use spark. Do you have any opinion on either? I know spark so I am little biased on the second option and can think of benefits around it, but I want to be sure I am not missing out any better solution. Looking forward to hear your take. Best ayan On Mon, May 15, 2017 at 8:24 AM, Denny Lee <denny.g....@gmail.com> wrote: > Sorry for the delay, you just did as I'm with the Azure CosmosDB (formerly > DocumentDB) team. If you'd like to make it official, why not add an issue > to the GitHub repo at https://github.com/Azure/ > azure-documentdb-spark/issues. HTH! > > On Thu, May 11, 2017 at 9:08 PM ayan guha <guha.a...@gmail.com> wrote: > >> Works for me too....you are a life-saver :) >> >> But the question: should/how we report this to Azure team? >> >> On Fri, May 12, 2017 at 10:32 AM, Denny Lee <denny.g....@gmail.com> >> wrote: >> >>> I was able to repro your issue when I had downloaded the jars via blob >>> but when I downloaded them as raw, I was able to get everything up and >>> running. For example: >>> >>> wget https://github.com/Azure/azure-documentdb-spark/*blob*/ >>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_ >>> 2.11/azure-documentdb-1.10.0.jar >>> wget https://github.com/Azure/azure-documentdb-spark/*blob*/ >>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_ >>> 2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar >>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3- >>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar >>> >>> resulted in the error: >>> SPARK_MAJOR_VERSION is set to 2, using Spark2 >>> Setting default log level to "WARN". >>> To adjust logging level use sc.setLogLevel(newLevel). >>> [init] error: error while loading <root>, Error accessing >>> /home/sshuser/jars/test/azure-documentdb-spark-0.0.3-SNAPSHOT.jar >>> >>> Failed to initialize compiler: object java.lang.Object in compiler >>> mirror not found. >>> ** Note that as of 2.8 scala does not assume use of the java classpath. >>> ** For the old behavior pass -usejavacp to scala, or if using a Settings >>> ** object programmatically, settings.usejavacp.value = true. >>> >>> But when running: >>> wget https://github.com/Azure/azure-documentdb-spark/raw/ >>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_ >>> 2.11/azure-documentdb-1.10.0.jar >>> wget https://github.com/Azure/azure-documentdb-spark/raw/ >>> master/releases/azure-documentdb-spark-0.0.3_2.0.2_ >>> 2.11/azure-documentdb-spark-0.0.3-SNAPSHOT.jar >>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3- >>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar >>> >>> it was up and running: >>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3- >>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar >>> SPARK_MAJOR_VERSION is set to 2, using Spark2 >>> Setting default log level to "WARN". >>> To adjust logging level use sc.setLogLevel(newLevel). >>> 17/05/11 22:54:06 WARN SparkContext: Use an existing SparkContext, some >>> configuration may not take effect. >>> Spark context Web UI available at http://10.0.0.22:4040 >>> Spark context available as 'sc' (master = yarn, app id = >>> application_1494248502247_0013). >>> Spark session available as 'spark'. >>> Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /___/ .__/\_,_/_/ /_/\_\ version 2.0.2.2.5.4.0-121 >>> /_/ >>> >>> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_121) >>> Type in expressions to have them evaluated. >>> Type :help for more information. >>> >>> scala> >>> >>> HTH! >>> >>> >>> On Wed, May 10, 2017 at 11:49 PM ayan guha <guha.a...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> Thanks for reply, but unfortunately did not work. I am getting same >>>> error. >>>> >>>> sshuser@ed0-svochd:~/azure-spark-docdb-test$ spark-shell --jars >>>> azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.jar >>>> SPARK_MAJOR_VERSION is set to 2, using Spark2 >>>> Setting default log level to "WARN". >>>> To adjust logging level use sc.setLogLevel(newLevel). >>>> [init] error: error while loading <root>, Error accessing >>>> /home/sshuser/azure-spark-docdb-test/azure-documentdb- >>>> spark-0.0.3-SNAPSHOT.jar >>>> >>>> Failed to initialize compiler: object java.lang.Object in compiler >>>> mirror not found. >>>> ** Note that as of 2.8 scala does not assume use of the java classpath. >>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings >>>> ** object programmatically, settings.usejavacp.value = true. >>>> >>>> Failed to initialize compiler: object java.lang.Object in compiler >>>> mirror not found. >>>> ** Note that as of 2.8 scala does not assume use of the java classpath. >>>> ** For the old behavior pass -usejavacp to scala, or if using a Settings >>>> ** object programmatically, settings.usejavacp.value = true. >>>> Exception in thread "main" java.lang.NullPointerException >>>> at scala.reflect.internal.SymbolTable.exitingPhase( >>>> SymbolTable.scala:256) >>>> at scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute( >>>> IMain.scala:896) >>>> at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain. >>>> scala:895) >>>> at scala.tools.nsc.interpreter.IMain$Request.headerPreamble$ >>>> lzycompute(IMain.scala:895) >>>> at scala.tools.nsc.interpreter.IMain$Request.headerPreamble( >>>> IMain.scala:895) >>>> at scala.tools.nsc.interpreter.IMain$Request$Wrapper. >>>> preamble(IMain.scala:918) >>>> at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$ >>>> apply$23.apply(IMain.scala:1337) >>>> at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$ >>>> apply$23.apply(IMain.scala:1336) >>>> at scala.tools.nsc.util.package$.stringFromWriter(package. >>>> scala:64) >>>> at scala.tools.nsc.interpreter.IMain$CodeAssembler$class. >>>> apply(IMain.scala:1336) >>>> at scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply( >>>> IMain.scala:908) >>>> at scala.tools.nsc.interpreter.IMain$Request.compile$ >>>> lzycompute(IMain.scala:1002) >>>> at scala.tools.nsc.interpreter.IMain$Request.compile(IMain. >>>> scala:997) >>>> at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579) >>>> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:567) >>>> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) >>>> at scala.tools.nsc.interpreter.ILoop.interpretStartingWith( >>>> ILoop.scala:807) >>>> at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) >>>> at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala: >>>> 395) >>>> at org.apache.spark.repl.SparkILoop$$anonfun$ >>>> initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) >>>> at org.apache.spark.repl.SparkILoop$$anonfun$ >>>> initializeSpark$1.apply(SparkILoop.scala:37) >>>> at org.apache.spark.repl.SparkILoop$$anonfun$ >>>> initializeSpark$1.apply(SparkILoop.scala:37) >>>> at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain. >>>> scala:214) >>>> at org.apache.spark.repl.SparkILoop.initializeSpark( >>>> SparkILoop.scala:37) >>>> at org.apache.spark.repl.SparkILoop.loadFiles( >>>> SparkILoop.scala:94) >>>> at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1. >>>> apply$mcZ$sp(ILoop.scala:920) >>>> at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1. >>>> apply(ILoop.scala:909) >>>> at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1. >>>> apply(ILoop.scala:909) >>>> at scala.reflect.internal.util.ScalaClassLoader$. >>>> savingContextLoader(ScalaClassLoader.scala:97) >>>> at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) >>>> at org.apache.spark.repl.Main$.doMain(Main.scala:68) >>>> at org.apache.spark.repl.Main$.main(Main.scala:51) >>>> at org.apache.spark.repl.Main.main(Main.scala) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke( >>>> NativeMethodAccessorImpl.java:62) >>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke( >>>> DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ >>>> deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) >>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( >>>> SparkSubmit.scala:185) >>>> at org.apache.spark.deploy.SparkSubmit$.submit( >>>> SparkSubmit.scala:210) >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. >>>> scala:124) >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> sshuser@ed0-svochd:~/azure-spark-docdb-test$ >>>> >>>> >>>> On Mon, May 8, 2017 at 11:50 PM, Denny Lee <denny.g....@gmail.com> >>>> wrote: >>>> >>>>> This appears to be an issue with the Spark to DocumentDB connector, >>>>> specifically version 0.0.1. Could you run the 0.0.3 version of the jar and >>>>> see if you're still getting the same error? i.e. >>>>> >>>>> spark-shell --master yarn --jars azure-documentdb-spark-0.0.3- >>>>> SNAPSHOT.jar,azure-documentdb-1.10.0.jar >>>>> >>>>> >>>>> On Mon, May 8, 2017 at 5:01 AM ayan guha <guha.a...@gmail.com> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I am facing an issue while trying to use azure-document-db connector >>>>>> from Microsoft. Instructions/Github >>>>>> <https://github.com/Azure/azure-documentdb-spark/wiki/Azure-DocumentDB-Spark-Connector-User-Guide> >>>>>> . >>>>>> >>>>>> Error while trying to add jar in spark-shell: >>>>>> >>>>>> spark-shell --jars azure-documentdb-spark-0.0.1. >>>>>> jar,azure-documentdb-1.9.6.jar >>>>>> SPARK_MAJOR_VERSION is set to 2, using Spark2 >>>>>> Setting default log level to "WARN". >>>>>> To adjust logging level use sc.setLogLevel(newLevel). >>>>>> [init] error: error while loading <root>, Error accessing >>>>>> /home/sshuser/azure-spark-docdb-test/v1/azure- >>>>>> documentdb-spark-0.0.1.jar >>>>>> >>>>>> Failed to initialize compiler: object java.lang.Object in compiler >>>>>> mirror not found. >>>>>> ** Note that as of 2.8 scala does not assume use of the java >>>>>> classpath. >>>>>> ** For the old behavior pass -usejavacp to scala, or if using a >>>>>> Settings >>>>>> ** object programmatically, settings.usejavacp.value = true. >>>>>> >>>>>> Failed to initialize compiler: object java.lang.Object in compiler >>>>>> mirror not found. >>>>>> ** Note that as of 2.8 scala does not assume use of the java >>>>>> classpath. >>>>>> ** For the old behavior pass -usejavacp to scala, or if using a >>>>>> Settings >>>>>> ** object programmatically, settings.usejavacp.value = true. >>>>>> Exception in thread "main" java.lang.NullPointerException >>>>>> at scala.reflect.internal.SymbolTable.exitingPhase( >>>>>> SymbolTable.scala:256) >>>>>> at scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute( >>>>>> IMain.scala:896) >>>>>> at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain. >>>>>> scala:895) >>>>>> at scala.tools.nsc.interpreter.IMain$Request.headerPreamble$ >>>>>> lzycompute(IMain.scala:895) >>>>>> at scala.tools.nsc.interpreter.IMain$Request.headerPreamble( >>>>>> IMain.scala:895) >>>>>> at scala.tools.nsc.interpreter.IMain$Request$Wrapper. >>>>>> preamble(IMain.scala:918) >>>>>> at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$ >>>>>> apply$23.apply(IMain.scala:1337) >>>>>> at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$ >>>>>> apply$23.apply(IMain.scala:1336) >>>>>> at scala.tools.nsc.util.package$.stringFromWriter(package. >>>>>> scala:64) >>>>>> at scala.tools.nsc.interpreter.IMain$CodeAssembler$class. >>>>>> apply(IMain.scala:1336) >>>>>> at scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply( >>>>>> IMain.scala:908) >>>>>> at scala.tools.nsc.interpreter.IMain$Request.compile$ >>>>>> lzycompute(IMain.scala:1002) >>>>>> at scala.tools.nsc.interpreter.IMain$Request.compile(IMain. >>>>>> scala:997) >>>>>> at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579) >>>>>> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala: >>>>>> 567) >>>>>> at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala: >>>>>> 565) >>>>>> at scala.tools.nsc.interpreter.ILoop.interpretStartingWith( >>>>>> ILoop.scala:807) >>>>>> at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) >>>>>> at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala: >>>>>> 395) >>>>>> at org.apache.spark.repl.SparkILoop$$anonfun$ >>>>>> initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) >>>>>> at org.apache.spark.repl.SparkILoop$$anonfun$ >>>>>> initializeSpark$1.apply(SparkILoop.scala:37) >>>>>> at org.apache.spark.repl.SparkILoop$$anonfun$ >>>>>> initializeSpark$1.apply(SparkILoop.scala:37) >>>>>> at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain. >>>>>> scala:214) >>>>>> at org.apache.spark.repl.SparkILoop.initializeSpark( >>>>>> SparkILoop.scala:37) >>>>>> at org.apache.spark.repl.SparkILoop.loadFiles( >>>>>> SparkILoop.scala:94) >>>>>> at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1. >>>>>> apply$mcZ$sp(ILoop.scala:920) >>>>>> at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1. >>>>>> apply(ILoop.scala:909) >>>>>> at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1. >>>>>> apply(ILoop.scala:909) >>>>>> at scala.reflect.internal.util.ScalaClassLoader$. >>>>>> savingContextLoader(ScalaClassLoader.scala:97) >>>>>> at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) >>>>>> at org.apache.spark.repl.Main$.doMain(Main.scala:68) >>>>>> at org.apache.spark.repl.Main$.main(Main.scala:51) >>>>>> at org.apache.spark.repl.Main.main(Main.scala) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>> Method) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke( >>>>>> NativeMethodAccessorImpl.java:62) >>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke( >>>>>> DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ >>>>>> deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) >>>>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( >>>>>> SparkSubmit.scala:185) >>>>>> at org.apache.spark.deploy.SparkSubmit$.submit( >>>>>> SparkSubmit.scala:210) >>>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. >>>>>> scala:124) >>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit. >>>>>> scala) >>>>>> sshuser@ed0-svochd:~/azure-spark-docdb-test/v1$ >>>>>> >>>>>> I think I am missing some basic configuration here or there is >>>>>> classpath related issue. Can anyone help? >>>>>> >>>>>> Additional info: >>>>>> Environment: HDInsight 3.5, based on HDP 2.5 >>>>>> >>>>>> sshuser@ed0-svochd:~/azure-spark-docdb-test/v1$ echo $JAVA_HOME >>>>>> /usr/lib/jvm/java-8-openjdk-amd64 >>>>>> >>>>>> sshuser@ed0-svochd:~/azure-spark-docdb-test/v1$ echo $SPARK_HOME >>>>>> /usr/hdp/current/spark2-client >>>>>> >>>>>> sshuser@ed0-svochd:~/azure-spark-docdb-test/v1$ java -version >>>>>> openjdk version "1.8.0_121" >>>>>> OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1. >>>>>> 16.04.2-b13) >>>>>> OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode) >>>>>> >>>>>> sshuser@ed0-svochd:~/azure-spark-docdb-test/v1$ uname -a >>>>>> Linux ed0-svochd 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 >>>>>> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>>>>> sshuser@ed0-svochd:~/azure-spark-docdb-test/v1$ >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Ayan Guha >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Ayan Guha >>>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > -- Best Regards, Ayan Guha