identifying newly arrived files in s3 in spark streaming

2016-06-06 Thread pandees waran
I am fairly new to spark streaming and i have a basic question on how spark
streaming works on s3 bucket which is periodically getting new files once
in 10 mins.
When i use spark streaming to process these files in this s3 path, will it
process all the files in this path (old+new files) every batch or is there
any way i can make it to process only the new files leaving the already
processed old files in the same path?

Thanks


Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Marcelo Vanzin
There should be a spark-defaults.conf file somewhere in your machine;
that's where the config is. You can try to change it, but if you're
using some tool to manage configuration for you, your changes might
end up being overwritten, so be careful with that.

You can also try "--properties-file /blah" where "/blah" is an empty
file, to start with a clean configuration, and see if that helps.
Although you might end up needing some of the other configs in the
original file.


On Mon, Jun 6, 2016 at 3:16 PM, verylucky...@gmail.com
 wrote:
> Thank you Marcelo.  I don't know how to remove it. Could you please tell me
> how I can remove that configuration?
>
> On Mon, Jun 6, 2016 at 5:04 PM, Marcelo Vanzin  wrote:
>>
>> This sounds like your default Spark configuration has an
>> "enabledAlgorithms" config in the SSL settings, and that is listing an
>> algorithm name that is not available in jdk8. Either remove that
>> configuration (to use the JDK's default algorithm list), or change it
>> so that it lists algorithms supported by jdk8.
>>
>> On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
>> wrote:
>> > Hi,
>> >
>> > I have a cluster (Hortonworks supported system) running Apache spark on
>> > 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>> >
>> > I don't have admin access to this cluster and would like to run spark
>> > (1.5.2
>> > and later versions) on java 8.
>> >
>> > I come from HPC/MPI background. So I naively copied all executables of
>> > spark
>> > "/usr/hdp/current/spark-client/" into my root folder.
>> >
>> > When I run spark-shell from my copied folder, it runs as expected on
>> > java 7.
>> >
>> > When I change $JAVA_HOME to point to java 8, and run spark-shell, I get
>> > the
>> > following error.
>> >
>> > Could you please help me fix this error?
>> >
>> > Exception in thread "main" java.security.NoSuchAlgorithmException: Error
>> > constructing implementation (algorithm: Default, provider: SunJSSE,
>> > class:
>> > sun.security.ssl.SSLContextImpl$DefaultSSLContext) at
>> > java.security.Provider$Service.newInstance(Provider.java:1617) at
>> > sun.security.jca.GetInstance.getInstance(GetInstance.java:236) at
>> > sun.security.jca.GetInstance.getInstance(GetInstance.java:164) at
>> > javax.net.ssl.SSLContext.getInstance(SSLContext.java:156) at
>> > javax.net.ssl.SSLContext.getDefault(SSLContext.java:96) at
>> > org.apache.spark.SSLOptions.liftedTree1$1(SSLOptions.scala:122) at
>> > org.apache.spark.SSLOptions.(SSLOptions.scala:114) at
>> > org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199) at
>> > org.apache.spark.SecurityManager.(SecurityManager.scala:243) at
>> > org.apache.spark.repl.SparkIMain.(SparkIMain.scala:118) at
>> >
>> > org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187)
>> > at
>> > org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:949)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>> > at
>> >
>> > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>> > at
>> >
>> > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at
>> > org.apache.spark.repl.Main$.main(Main.scala:31) at
>> > org.apache.spark.repl.Main.main(Main.scala) at
>> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> >
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> > at
>> >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:497) at
>> >
>> > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
>> > at
>> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>> > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at
>> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at
>> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:
>> > java.io.EOFException at
>> > java.io.DataInputStream.readInt(DataInputStream.java:392) at
>> > sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653) at
>> > sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56)
>> > at
>> >
>> > sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225)
>> > at
>> >
>> > sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
>> > at java.security.KeyStore.load(KeyStore.java:1445) 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread verylucky...@gmail.com
Thank you Marcelo.  I don't know how to remove it. Could you please tell me
how I can remove that configuration?

On Mon, Jun 6, 2016 at 5:04 PM, Marcelo Vanzin  wrote:

> This sounds like your default Spark configuration has an
> "enabledAlgorithms" config in the SSL settings, and that is listing an
> algorithm name that is not available in jdk8. Either remove that
> configuration (to use the JDK's default algorithm list), or change it
> so that it lists algorithms supported by jdk8.
>
> On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
> wrote:
> > Hi,
> >
> > I have a cluster (Hortonworks supported system) running Apache spark on
> > 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
> >
> > I don't have admin access to this cluster and would like to run spark
> (1.5.2
> > and later versions) on java 8.
> >
> > I come from HPC/MPI background. So I naively copied all executables of
> spark
> > "/usr/hdp/current/spark-client/" into my root folder.
> >
> > When I run spark-shell from my copied folder, it runs as expected on
> java 7.
> >
> > When I change $JAVA_HOME to point to java 8, and run spark-shell, I get
> the
> > following error.
> >
> > Could you please help me fix this error?
> >
> > Exception in thread "main" java.security.NoSuchAlgorithmException: Error
> > constructing implementation (algorithm: Default, provider: SunJSSE,
> class:
> > sun.security.ssl.SSLContextImpl$DefaultSSLContext) at
> > java.security.Provider$Service.newInstance(Provider.java:1617) at
> > sun.security.jca.GetInstance.getInstance(GetInstance.java:236) at
> > sun.security.jca.GetInstance.getInstance(GetInstance.java:164) at
> > javax.net.ssl.SSLContext.getInstance(SSLContext.java:156) at
> > javax.net.ssl.SSLContext.getDefault(SSLContext.java:96) at
> > org.apache.spark.SSLOptions.liftedTree1$1(SSLOptions.scala:122) at
> > org.apache.spark.SSLOptions.(SSLOptions.scala:114) at
> > org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199) at
> > org.apache.spark.SecurityManager.(SecurityManager.scala:243) at
> > org.apache.spark.repl.SparkIMain.(SparkIMain.scala:118) at
> >
> org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187)
> > at
> org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:949)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> > at
> >
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> > at
> >
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> > at
> > org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at
> > org.apache.spark.repl.Main$.main(Main.scala:31) at
> > org.apache.spark.repl.Main.main(Main.scala) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:497) at
> >
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
> > at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at
> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at
> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:
> > java.io.EOFException at
> > java.io.DataInputStream.readInt(DataInputStream.java:392) at
> > sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653) at
> > sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56)
> at
> >
> sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225)
> > at
> >
> sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
> > at java.security.KeyStore.load(KeyStore.java:1445) at
> >
> sun.security.ssl.TrustManagerFactoryImpl.getCacertsKeyStore(TrustManagerFactoryImpl.java:226)
> > at
> >
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.getDefaultTrustManager(SSLContextImpl.java:767)
> > at
> >
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.(SSLContextImpl.java:733)
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> > at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at
> > 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Marcelo Vanzin
This sounds like your default Spark configuration has an
"enabledAlgorithms" config in the SSL settings, and that is listing an
algorithm name that is not available in jdk8. Either remove that
configuration (to use the JDK's default algorithm list), or change it
so that it lists algorithms supported by jdk8.

On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man  wrote:
> Hi,
>
> I have a cluster (Hortonworks supported system) running Apache spark on
> 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>
> I don't have admin access to this cluster and would like to run spark (1.5.2
> and later versions) on java 8.
>
> I come from HPC/MPI background. So I naively copied all executables of spark
> "/usr/hdp/current/spark-client/" into my root folder.
>
> When I run spark-shell from my copied folder, it runs as expected on java 7.
>
> When I change $JAVA_HOME to point to java 8, and run spark-shell, I get the
> following error.
>
> Could you please help me fix this error?
>
> Exception in thread "main" java.security.NoSuchAlgorithmException: Error
> constructing implementation (algorithm: Default, provider: SunJSSE, class:
> sun.security.ssl.SSLContextImpl$DefaultSSLContext) at
> java.security.Provider$Service.newInstance(Provider.java:1617) at
> sun.security.jca.GetInstance.getInstance(GetInstance.java:236) at
> sun.security.jca.GetInstance.getInstance(GetInstance.java:164) at
> javax.net.ssl.SSLContext.getInstance(SSLContext.java:156) at
> javax.net.ssl.SSLContext.getDefault(SSLContext.java:96) at
> org.apache.spark.SSLOptions.liftedTree1$1(SSLOptions.scala:122) at
> org.apache.spark.SSLOptions.(SSLOptions.scala:114) at
> org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199) at
> org.apache.spark.SecurityManager.(SecurityManager.scala:243) at
> org.apache.spark.repl.SparkIMain.(SparkIMain.scala:118) at
> org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187)
> at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:949)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at
> org.apache.spark.repl.Main$.main(Main.scala:31) at
> org.apache.spark.repl.Main.main(Main.scala) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497) at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:
> java.io.EOFException at
> java.io.DataInputStream.readInt(DataInputStream.java:392) at
> sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653) at
> sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) at
> sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225)
> at
> sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
> at java.security.KeyStore.load(KeyStore.java:1445) at
> sun.security.ssl.TrustManagerFactoryImpl.getCacertsKeyStore(TrustManagerFactoryImpl.java:226)
> at
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.getDefaultTrustManager(SSLContextImpl.java:767)
> at
> sun.security.ssl.SSLContextImpl$DefaultSSLContext.(SSLContextImpl.java:733)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at
> java.security.Provider$Service.newInstance(Provider.java:1595) ... 28 more
>
>



-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread verylucky...@gmail.com
Thank you Ted for the reference. I am going through it in detail.


Thank you Marco for your suggestion.
I created a properties file with these two lines

spark.driver.extraJavaOptions -Djsse.enableSNIExtension=false
spark.executor.extraJavaOptions -Djsse.enableSNIExtension=false

and gave this file as input to spark-shell with --properties-file.

I still get the same error.
Do you recommend giving these flags differently?


Thank you again!

On Mon, Jun 6, 2016 at 4:02 PM, Marco Mistroni  wrote:

> HI
>  have you tried to add this flag?
>
> -Djsse.enableSNIExtension=false
>
> i had similar issues in another standalone application when i switched to
> java8 from java7
> hth
>  marco
>
> On Mon, Jun 6, 2016 at 9:58 PM, Koert Kuipers  wrote:
>
>> mhh i would not be very happy if the implication is that i have to start
>> maintaining separate spark builds for client clusters that use java 8...
>>
>> On Mon, Jun 6, 2016 at 4:34 PM, Ted Yu  wrote:
>>
>>> Please see:
>>> https://spark.apache.org/docs/latest/security.html
>>>
>>> w.r.t. Java 8, probably you need to rebuild 1.5.2 using Java 8.
>>>
>>> Cheers
>>>
>>> On Mon, Jun 6, 2016 at 1:19 PM, verylucky...@gmail.com <
>>> verylucky...@gmail.com> wrote:
>>>
 Thank you for your response.

 I have seen this and couple of other similar ones about java ssl in
 general. However, I am not sure how it applies to Spark and specifically to
 my case.

 This error I mention above occurs when I switch from java 7 to java 8
 by changing the env variable JAVA_HOME.

 The error occurs seems to occur at the time of starting Jetty
 HTTPServer.

 Can you please point me to resources that help me understand how
 security is managed in Spark and how changing from java 7 to 8 can mess up
 these configurations?


 Thank you!

 On Mon, Jun 6, 2016 at 2:37 PM, Ted Yu  wrote:

> Have you seen this ?
>
>
> http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation
>
> On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man  > wrote:
>
>> Hi,
>>
>> I have a cluster (Hortonworks supported system) running Apache spark
>> on 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>>
>> I don't have admin access to this cluster and would like to run spark
>> (1.5.2 and later versions) on java 8.
>>
>> I come from HPC/MPI background. So I naively copied all executables
>> of spark "/usr/hdp/current/spark-client/" into my root folder.
>>
>> When I run spark-shell from my copied folder, it runs as expected on
>> java 7.
>>
>> When I change $JAVA_HOME to point to java 8, and run spark-shell, I
>> get the following error.
>>
>> Could you please help me fix this error?
>>
>> Exception in thread "main" java.security.NoSuchAlgorithmException:
>> Error constructing implementation (algorithm: Default, provider:
>> SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext)
>> at java.security.Provider$Service.newInstance(Provider.java:1617) at
>> sun.security.jca.GetInstance.getInstance(GetInstance.java:236) at sun
>> .security.jca.GetInstance.getInstance(GetInstance.java:164) at javax.
>> net.ssl.SSLContext.getInstance(SSLContext.java:156) at javax.net.ssl.
>> SSLContext.getDefault(SSLContext.java:96) at org.apache.spark.
>> SSLOptions.liftedTree1$1(SSLOptions.scala:122) at org.apache.spark.
>> SSLOptions.(SSLOptions.scala:114) at org.apache.spark.
>> SSLOptions$.parse(SSLOptions.scala:199) at org.apache.spark.
>> SecurityManager.(SecurityManager.scala:243) at org.apache.spark
>> .repl.SparkIMain.(SparkIMain.scala:118) at org.apache.spark.
>> repl.SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187)
>> at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.
>> scala:217) at org.apache.spark.repl.
>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
>> apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply
>> (SparkILoop.scala:945) at org.apache.spark.repl.
>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply
>> (SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
>> savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.
>> repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.
>> scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.
>> scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org
>> .apache.spark.repl.Main.main(Main.scala) at sun.reflect.
>> NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.
>> 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Marco Mistroni
HI
 have you tried to add this flag?

-Djsse.enableSNIExtension=false

i had similar issues in another standalone application when i switched to
java8 from java7
hth
 marco

On Mon, Jun 6, 2016 at 9:58 PM, Koert Kuipers  wrote:

> mhh i would not be very happy if the implication is that i have to start
> maintaining separate spark builds for client clusters that use java 8...
>
> On Mon, Jun 6, 2016 at 4:34 PM, Ted Yu  wrote:
>
>> Please see:
>> https://spark.apache.org/docs/latest/security.html
>>
>> w.r.t. Java 8, probably you need to rebuild 1.5.2 using Java 8.
>>
>> Cheers
>>
>> On Mon, Jun 6, 2016 at 1:19 PM, verylucky...@gmail.com <
>> verylucky...@gmail.com> wrote:
>>
>>> Thank you for your response.
>>>
>>> I have seen this and couple of other similar ones about java ssl in
>>> general. However, I am not sure how it applies to Spark and specifically to
>>> my case.
>>>
>>> This error I mention above occurs when I switch from java 7 to java 8 by
>>> changing the env variable JAVA_HOME.
>>>
>>> The error occurs seems to occur at the time of starting Jetty HTTPServer.
>>>
>>> Can you please point me to resources that help me understand how
>>> security is managed in Spark and how changing from java 7 to 8 can mess up
>>> these configurations?
>>>
>>>
>>> Thank you!
>>>
>>> On Mon, Jun 6, 2016 at 2:37 PM, Ted Yu  wrote:
>>>
 Have you seen this ?


 http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation

 On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
 wrote:

> Hi,
>
> I have a cluster (Hortonworks supported system) running Apache spark
> on 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>
> I don't have admin access to this cluster and would like to run spark
> (1.5.2 and later versions) on java 8.
>
> I come from HPC/MPI background. So I naively copied all executables of
> spark "/usr/hdp/current/spark-client/" into my root folder.
>
> When I run spark-shell from my copied folder, it runs as expected on
> java 7.
>
> When I change $JAVA_HOME to point to java 8, and run spark-shell, I
> get the following error.
>
> Could you please help me fix this error?
>
> Exception in thread "main" java.security.NoSuchAlgorithmException:
> Error constructing implementation (algorithm: Default, provider:
> SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext) at
> java.security.Provider$Service.newInstance(Provider.java:1617) at sun.
> security.jca.GetInstance.getInstance(GetInstance.java:236) at sun.
> security.jca.GetInstance.getInstance(GetInstance.java:164) at javax.
> net.ssl.SSLContext.getInstance(SSLContext.java:156) at javax.net.ssl.
> SSLContext.getDefault(SSLContext.java:96) at org.apache.spark.
> SSLOptions.liftedTree1$1(SSLOptions.scala:122) at org.apache.spark.
> SSLOptions.(SSLOptions.scala:114) at org.apache.spark.
> SSLOptions$.parse(SSLOptions.scala:199) at org.apache.spark.
> SecurityManager.(SecurityManager.scala:243) at org.apache.spark.
> repl.SparkIMain.(SparkIMain.scala:118) at org.apache.spark.repl.
> SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187) at org.
> apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
> at org.apache.spark.repl.
> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
> apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
> SparkILoop.scala:945) at org.apache.spark.repl.
> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
> SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
> savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.
> repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.
> scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.
> scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.
> apache.spark.repl.Main.main(Main.scala) at sun.reflect.
> NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.
> NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.
> invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.
> org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:
> 180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:
> 205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
> 120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.EOFException at 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Koert Kuipers
mhh i would not be very happy if the implication is that i have to start
maintaining separate spark builds for client clusters that use java 8...

On Mon, Jun 6, 2016 at 4:34 PM, Ted Yu  wrote:

> Please see:
> https://spark.apache.org/docs/latest/security.html
>
> w.r.t. Java 8, probably you need to rebuild 1.5.2 using Java 8.
>
> Cheers
>
> On Mon, Jun 6, 2016 at 1:19 PM, verylucky...@gmail.com <
> verylucky...@gmail.com> wrote:
>
>> Thank you for your response.
>>
>> I have seen this and couple of other similar ones about java ssl in
>> general. However, I am not sure how it applies to Spark and specifically to
>> my case.
>>
>> This error I mention above occurs when I switch from java 7 to java 8 by
>> changing the env variable JAVA_HOME.
>>
>> The error occurs seems to occur at the time of starting Jetty HTTPServer.
>>
>> Can you please point me to resources that help me understand how security
>> is managed in Spark and how changing from java 7 to 8 can mess up these
>> configurations?
>>
>>
>> Thank you!
>>
>> On Mon, Jun 6, 2016 at 2:37 PM, Ted Yu  wrote:
>>
>>> Have you seen this ?
>>>
>>>
>>> http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation
>>>
>>> On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
>>> wrote:
>>>
 Hi,

 I have a cluster (Hortonworks supported system) running Apache spark on
 1.5.2 on Java 7, installed by admin. Java 8 is also installed.

 I don't have admin access to this cluster and would like to run spark
 (1.5.2 and later versions) on java 8.

 I come from HPC/MPI background. So I naively copied all executables of
 spark "/usr/hdp/current/spark-client/" into my root folder.

 When I run spark-shell from my copied folder, it runs as expected on
 java 7.

 When I change $JAVA_HOME to point to java 8, and run spark-shell, I get
 the following error.

 Could you please help me fix this error?

 Exception in thread "main" java.security.NoSuchAlgorithmException:
 Error constructing implementation (algorithm: Default, provider:
 SunJSSE, class: sun.security.ssl.SSLContextImpl$DefaultSSLContext) at
 java.security.Provider$Service.newInstance(Provider.java:1617) at sun.
 security.jca.GetInstance.getInstance(GetInstance.java:236) at sun.
 security.jca.GetInstance.getInstance(GetInstance.java:164) at javax.net
 .ssl.SSLContext.getInstance(SSLContext.java:156) at javax.net.ssl.
 SSLContext.getDefault(SSLContext.java:96) at org.apache.spark.
 SSLOptions.liftedTree1$1(SSLOptions.scala:122) at org.apache.spark.
 SSLOptions.(SSLOptions.scala:114) at org.apache.spark.SSLOptions$
 .parse(SSLOptions.scala:199) at org.apache.spark.SecurityManager.>>> >(SecurityManager.scala:243) at org.apache.spark.repl.SparkIMain.>>> >(SparkIMain.scala:118) at org.apache.spark.repl.
 SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187) at org.
 apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217)
 at org.apache.spark.repl.
 SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
 apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
 SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
 SparkILoop.scala:945) at org.apache.spark.repl.
 SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
 SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
 savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.
 repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.
 scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala
 :1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache
 .spark.repl.Main.main(Main.scala) at sun.reflect.
 NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.
 NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.
 invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.
 org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685) at
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at
 org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:
 java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream
 .java:392) at sun.security.provider.JavaKeyStore.engineLoad(
 JavaKeyStore.java:653) at sun.security.provider.JavaKeyStore$JKS.
 engineLoad(JavaKeyStore.java:56) at sun.security.provider.
 KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225) at sun.
 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Ted Yu
Please see:
https://spark.apache.org/docs/latest/security.html

w.r.t. Java 8, probably you need to rebuild 1.5.2 using Java 8.

Cheers

On Mon, Jun 6, 2016 at 1:19 PM, verylucky...@gmail.com <
verylucky...@gmail.com> wrote:

> Thank you for your response.
>
> I have seen this and couple of other similar ones about java ssl in
> general. However, I am not sure how it applies to Spark and specifically to
> my case.
>
> This error I mention above occurs when I switch from java 7 to java 8 by
> changing the env variable JAVA_HOME.
>
> The error occurs seems to occur at the time of starting Jetty HTTPServer.
>
> Can you please point me to resources that help me understand how security
> is managed in Spark and how changing from java 7 to 8 can mess up these
> configurations?
>
>
> Thank you!
>
> On Mon, Jun 6, 2016 at 2:37 PM, Ted Yu  wrote:
>
>> Have you seen this ?
>>
>>
>> http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation
>>
>> On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
>> wrote:
>>
>>> Hi,
>>>
>>> I have a cluster (Hortonworks supported system) running Apache spark on
>>> 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>>>
>>> I don't have admin access to this cluster and would like to run spark
>>> (1.5.2 and later versions) on java 8.
>>>
>>> I come from HPC/MPI background. So I naively copied all executables of
>>> spark "/usr/hdp/current/spark-client/" into my root folder.
>>>
>>> When I run spark-shell from my copied folder, it runs as expected on
>>> java 7.
>>>
>>> When I change $JAVA_HOME to point to java 8, and run spark-shell, I get
>>> the following error.
>>>
>>> Could you please help me fix this error?
>>>
>>> Exception in thread "main" java.security.NoSuchAlgorithmException: Error
>>> constructing implementation (algorithm: Default, provider: SunJSSE,
>>> class: sun.security.ssl.SSLContextImpl$DefaultSSLContext) at java.
>>> security.Provider$Service.newInstance(Provider.java:1617) at sun.
>>> security.jca.GetInstance.getInstance(GetInstance.java:236) at sun.
>>> security.jca.GetInstance.getInstance(GetInstance.java:164) at javax.net.
>>> ssl.SSLContext.getInstance(SSLContext.java:156) at javax.net.ssl.
>>> SSLContext.getDefault(SSLContext.java:96) at org.apache.spark.SSLOptions
>>> .liftedTree1$1(SSLOptions.scala:122) at org.apache.spark.SSLOptions.<
>>> init>(SSLOptions.scala:114) at org.apache.spark.SSLOptions$.parse(
>>> SSLOptions.scala:199) at org.apache.spark.SecurityManager.(
>>> SecurityManager.scala:243) at org.apache.spark.repl.SparkIMain.(
>>> SparkIMain.scala:118) at org.apache.spark.repl.
>>> SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187) at org.
>>> apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217) at
>>> org.apache.spark.repl.
>>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
>>> apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
>>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
>>> SparkILoop.scala:945) at org.apache.spark.repl.
>>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
>>> SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
>>> savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl
>>> .SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:
>>> 945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>> at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.
>>> repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.
>>> invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(
>>> NativeMethodAccessorImpl.java:62) at sun.reflect.
>>> DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43
>>> ) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.
>>> spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(
>>> SparkSubmit.scala:685) at org.apache.spark.deploy.SparkSubmit$.
>>> doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.
>>> SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.
>>> SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.
>>> SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.EOFException at
>>> java.io.DataInputStream.readInt(DataInputStream.java:392) at sun.
>>> security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653) at sun.
>>> security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) at
>>> sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.
>>> java:225) at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad
>>> (JavaKeyStore.java:70) at java.security.KeyStore.load(KeyStore.java:1445
>>> ) at sun.security.ssl.TrustManagerFactoryImpl.getCacertsKeyStore(
>>> TrustManagerFactoryImpl.java:226) at sun.security.ssl.
>>> SSLContextImpl$DefaultSSLContext.getDefaultTrustManager(SSLContextImpl.
>>> java:767) at 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread verylucky...@gmail.com
Thank you for your response.

I have seen this and couple of other similar ones about java ssl in
general. However, I am not sure how it applies to Spark and specifically to
my case.

This error I mention above occurs when I switch from java 7 to java 8 by
changing the env variable JAVA_HOME.

The error occurs seems to occur at the time of starting Jetty HTTPServer.

Can you please point me to resources that help me understand how security
is managed in Spark and how changing from java 7 to 8 can mess up these
configurations?


Thank you!

On Mon, Jun 6, 2016 at 2:37 PM, Ted Yu  wrote:

> Have you seen this ?
>
>
> http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation
>
> On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
> wrote:
>
>> Hi,
>>
>> I have a cluster (Hortonworks supported system) running Apache spark on
>> 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>>
>> I don't have admin access to this cluster and would like to run spark
>> (1.5.2 and later versions) on java 8.
>>
>> I come from HPC/MPI background. So I naively copied all executables of
>> spark "/usr/hdp/current/spark-client/" into my root folder.
>>
>> When I run spark-shell from my copied folder, it runs as expected on java
>> 7.
>>
>> When I change $JAVA_HOME to point to java 8, and run spark-shell, I get
>> the following error.
>>
>> Could you please help me fix this error?
>>
>> Exception in thread "main" java.security.NoSuchAlgorithmException: Error
>> constructing implementation (algorithm: Default, provider: SunJSSE, class
>> : sun.security.ssl.SSLContextImpl$DefaultSSLContext) at java.security.
>> Provider$Service.newInstance(Provider.java:1617) at sun.security.jca.
>> GetInstance.getInstance(GetInstance.java:236) at sun.security.jca.
>> GetInstance.getInstance(GetInstance.java:164) at javax.net.ssl.SSLContext
>> .getInstance(SSLContext.java:156) at javax.net.ssl.SSLContext.getDefault(
>> SSLContext.java:96) at org.apache.spark.SSLOptions.liftedTree1$1(
>> SSLOptions.scala:122) at org.apache.spark.SSLOptions.(SSLOptions.
>> scala:114) at org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199)
>> at org.apache.spark.SecurityManager.(SecurityManager.scala:243) at
>> org.apache.spark.repl.SparkIMain.(SparkIMain.scala:118) at org.
>> apache.spark.repl.SparkILoop$SparkILoopInterpreter.(SparkILoop.
>> scala:187) at org.apache.spark.repl.SparkILoop.createInterpreter(
>> SparkILoop.scala:217) at org.apache.spark.repl.
>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
>> apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
>> SparkILoop.scala:945) at org.apache.spark.repl.
>> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
>> SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
>> savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.
>> SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945
>> ) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at
>> org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.
>> Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(
>> Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(
>> NativeMethodAccessorImpl.java:62) at sun.reflect.
>> DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.
>> deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(
>> SparkSubmit.scala:685) at org.apache.spark.deploy.SparkSubmit$.
>> doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.
>> SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.
>> SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.
>> SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.EOFException at
>> java.io.DataInputStream.readInt(DataInputStream.java:392) at sun.security
>> .provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653) at sun.security.
>> provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) at sun.
>> security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225
>> ) at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(
>> JavaKeyStore.java:70) at java.security.KeyStore.load(KeyStore.java:1445)
>> at sun.security.ssl.TrustManagerFactoryImpl.getCacertsKeyStore(
>> TrustManagerFactoryImpl.java:226) at sun.security.ssl.
>> SSLContextImpl$DefaultSSLContext.getDefaultTrustManager(SSLContextImpl.
>> java:767) at sun.security.ssl.SSLContextImpl$DefaultSSLContext.(
>> SSLContextImpl.java:733) at sun.reflect.NativeConstructorAccessorImpl.
>> newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.
>> newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.
>> DelegatingConstructorAccessorImpl.newInstance(
>> 

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Ted Yu
Have you seen this ?

http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation

On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man 
wrote:

> Hi,
>
> I have a cluster (Hortonworks supported system) running Apache spark on
> 1.5.2 on Java 7, installed by admin. Java 8 is also installed.
>
> I don't have admin access to this cluster and would like to run spark
> (1.5.2 and later versions) on java 8.
>
> I come from HPC/MPI background. So I naively copied all executables of
> spark "/usr/hdp/current/spark-client/" into my root folder.
>
> When I run spark-shell from my copied folder, it runs as expected on java
> 7.
>
> When I change $JAVA_HOME to point to java 8, and run spark-shell, I get
> the following error.
>
> Could you please help me fix this error?
>
> Exception in thread "main" java.security.NoSuchAlgorithmException: Error
> constructing implementation (algorithm: Default, provider: SunJSSE, class:
> sun.security.ssl.SSLContextImpl$DefaultSSLContext) at java.security.
> Provider$Service.newInstance(Provider.java:1617) at sun.security.jca.
> GetInstance.getInstance(GetInstance.java:236) at sun.security.jca.
> GetInstance.getInstance(GetInstance.java:164) at javax.net.ssl.SSLContext.
> getInstance(SSLContext.java:156) at javax.net.ssl.SSLContext.getDefault(
> SSLContext.java:96) at org.apache.spark.SSLOptions.liftedTree1$1(
> SSLOptions.scala:122) at org.apache.spark.SSLOptions.(SSLOptions.
> scala:114) at org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199) at
> org.apache.spark.SecurityManager.(SecurityManager.scala:243) at org.
> apache.spark.repl.SparkIMain.(SparkIMain.scala:118) at org.apache.
> spark.repl.SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187)
> at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217
> ) at org.apache.spark.repl.
> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
> apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
> SparkILoop.scala:945) at org.apache.spark.repl.
> SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
> SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
> savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.
> SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.
> apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.
> main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62) at sun.reflect.
> DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.
> deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(
> SparkSubmit.scala:685) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1
> (SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(
> SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(
> SparkSubmit.scala) Caused by: java.io.EOFException at java.io.
> DataInputStream.readInt(DataInputStream.java:392) at sun.security.provider
> .JavaKeyStore.engineLoad(JavaKeyStore.java:653) at sun.security.provider.
> JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) at sun.security.provider
> .KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:225) at sun.security.
> provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70) at
> java.security.KeyStore.load(KeyStore.java:1445) at sun.security.ssl.
> TrustManagerFactoryImpl.getCacertsKeyStore(TrustManagerFactoryImpl.java:
> 226) at sun.security.ssl.SSLContextImpl$DefaultSSLContext.
> getDefaultTrustManager(SSLContextImpl.java:767) at sun.security.ssl.
> SSLContextImpl$DefaultSSLContext.(SSLContextImpl.java:733) at sun.
> reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.
> reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62) at sun.reflect.
> DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.
> Constructor.newInstance(Constructor.java:422) at java.security.
> Provider$Service.newInstance(Provider.java:1595) ... 28 more
>
>
>


Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread verylucky Man
Hi,

I have a cluster (Hortonworks supported system) running Apache spark on
1.5.2 on Java 7, installed by admin. Java 8 is also installed.

I don't have admin access to this cluster and would like to run spark
(1.5.2 and later versions) on java 8.

I come from HPC/MPI background. So I naively copied all executables of
spark "/usr/hdp/current/spark-client/" into my root folder.

When I run spark-shell from my copied folder, it runs as expected on java
7.

When I change $JAVA_HOME to point to java 8, and run spark-shell, I get the
following error.

Could you please help me fix this error?

Exception in thread "main" java.security.NoSuchAlgorithmException: Error
constructing implementation (algorithm: Default, provider: SunJSSE, class:
sun.security.ssl.SSLContextImpl$DefaultSSLContext) at java.security.
Provider$Service.newInstance(Provider.java:1617) at sun.security.jca.
GetInstance.getInstance(GetInstance.java:236) at sun.security.jca.
GetInstance.getInstance(GetInstance.java:164) at javax.net.ssl.SSLContext.
getInstance(SSLContext.java:156) at javax.net.ssl.SSLContext.getDefault(
SSLContext.java:96) at org.apache.spark.SSLOptions.liftedTree1$1(SSLOptions.
scala:122) at org.apache.spark.SSLOptions.(SSLOptions.scala:114) at
org.apache.spark.SSLOptions$.parse(SSLOptions.scala:199) at org.apache.spark
.SecurityManager.(SecurityManager.scala:243) at org.apache.spark.repl.
SparkIMain.(SparkIMain.scala:118) at org.apache.spark.repl.
SparkILoop$SparkILoopInterpreter.(SparkILoop.scala:187) at org.apache.
spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:217) at org.apache.
spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.
apply$mcZ$sp(SparkILoop.scala:949) at org.apache.spark.repl.
SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
SparkILoop.scala:945) at org.apache.spark.repl.
SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(
SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.
savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.
SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.
apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.
main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method) at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:62) at sun.reflect.
DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.
deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit
.scala:685) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.
scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:
205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.
io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392)
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:653) at
sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) at
sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:
225) at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(
JavaKeyStore.java:70) at java.security.KeyStore.load(KeyStore.java:1445) at
sun.security.ssl.TrustManagerFactoryImpl.getCacertsKeyStore(
TrustManagerFactoryImpl.java:226) at sun.security.ssl.
SSLContextImpl$DefaultSSLContext.getDefaultTrustManager(SSLContextImpl.java:
767) at sun.security.ssl.SSLContextImpl$DefaultSSLContext.(
SSLContextImpl.java:733) at sun.reflect.NativeConstructorAccessorImpl.
newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.
newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.
DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.
newInstance(Constructor.java:422) at java.security.Provider$Service.
newInstance(Provider.java:1595) ... 28 more


Re: Specify node where driver should run

2016-06-06 Thread Bryan Cutler
I'm not an expert on YARN so anyone please correct me if I'm wrong, but I
believe the Resource Manager will schedule the application to be run on the
AM of any node that has a Node Manager, depending on available resources.
So you would normally query the RM via the REST API to determine that.  You
can restrict which nodes get scheduled using this propery
spark.yarn.am.nodeLabelExpression.
See here for details
http://spark.apache.org/docs/latest/running-on-yarn.html

On Mon, Jun 6, 2016 at 9:04 AM, Saiph Kappa  wrote:

> How can I specify the node where application master should run in the yarn
> conf? I haven't found any useful information regarding that.
>
> Thanks.
>
> On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler  wrote:
>
>> In that mode, it will run on the application master, whichever node that
>> is as specified in your yarn conf.
>> On Jun 5, 2016 4:54 PM, "Saiph Kappa"  wrote:
>>
>>> Hi,
>>>
>>> In yarn-cluster mode, is there any way to specify on which node I want
>>> the driver to run?
>>>
>>> Thanks.
>>>
>>
>


Re: groupByKey returns an emptyRDD

2016-06-06 Thread Ted Yu
Can you give us a bit more information ?

how you packaged the code into jar
command you used for execution
version of Spark
related log snippet

Thanks

On Mon, Jun 6, 2016 at 10:43 AM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> I'm wrapped the following code into a jar:
>
> val test = sc.parallelize(Seq(("daniel", "a"), ("daniel", "b"), ("test", 
> "1)")))
>
> val agg = test.groupByKey()
> agg.collect.foreach(r=>{println(r._1)})
>
>
> The result of groupByKey is an empty RDD, when I'm trying the same code using 
> the spark-shell it's running as expected.
>
>
> Any ideas?
>
>
> Thank you,
>
> Daniel
>
>


groupByKey returns an emptyRDD

2016-06-06 Thread Daniel Haviv
Hi,
I'm wrapped the following code into a jar:

val test = sc.parallelize(Seq(("daniel", "a"), ("daniel", "b"), ("test", "1)")))

val agg = test.groupByKey()
agg.collect.foreach(r=>{println(r._1)})


The result of groupByKey is an empty RDD, when I'm trying the same
code using the spark-shell it's running as expected.


Any ideas?


Thank you,

Daniel


Re: Dataset Outer Join vs RDD Outer Join

2016-06-06 Thread Michael Armbrust
That kind of stuff is likely fixed in 2.0.  If you can get a reproduction
working there it would be very helpful if you could open a JIRA.

On Mon, Jun 6, 2016 at 7:37 AM, Richard Marscher 
wrote:

> A quick unit test attempt didn't get far replacing map with as[], I'm only
> working against 1.6.1 at the moment though, I was going to try 2.0 but I'm
> having a hard time building a working spark-sql jar from source, the only
> ones I've managed to make are intended for the full assembly fat jar.
>
>
> Example of the error from calling joinWith as left_outer and then
> .as[(Option[T], U]) where T and U are Int and Int.
>
> [info] newinstance(class scala.Tuple2,decodeusingserializer(input[0,
> StructType(StructField(_1,IntegerType,true),
> StructField(_2,IntegerType,true))],scala.Option,true),decodeusingserializer(input[1,
> StructType(StructField(_1,IntegerType,true),
> StructField(_2,IntegerType,true))],scala.Option,true),false,ObjectType(class
> scala.Tuple2),None)
> [info] :- decodeusingserializer(input[0,
> StructType(StructField(_1,IntegerType,true),
> StructField(_2,IntegerType,true))],scala.Option,true)
> [info] :  +- input[0, StructType(StructField(_1,IntegerType,true),
> StructField(_2,IntegerType,true))]
> [info] +- decodeusingserializer(input[1,
> StructType(StructField(_1,IntegerType,true),
> StructField(_2,IntegerType,true))],scala.Option,true)
> [info]+- input[1, StructType(StructField(_1,IntegerType,true),
> StructField(_2,IntegerType,true))]
>
> Cause: java.util.concurrent.ExecutionException: java.lang.Exception:
> failed to compile: org.codehaus.commons.compiler.CompileException: File
> 'generated.java', Line 32, Column 60: No applicable constructor/method
> found for actual parameters "org.apache.spark.sql.catalyst.InternalRow";
> candidates are: "public static java.nio.ByteBuffer
> java.nio.ByteBuffer.wrap(byte[])", "public static java.nio.ByteBuffer
> java.nio.ByteBuffer.wrap(byte[], int, int)"
>
> The generated code is passing InternalRow objects into the ByteBuffer
>
> Starting from two Datasets of types Dataset[(Int, Int)] with expression
> $"left._1" === $"right._1". I'll have to spend some time getting a better
> understanding of this analysis phase, but hopefully I can come up with
> something.
>
> On Wed, Jun 1, 2016 at 3:43 PM, Michael Armbrust 
> wrote:
>
>> Option should place nicely with encoders, but its always possible there
>> are bugs.  I think those function signatures are slightly more expensive
>> (one extra object allocation) and its not as java friendly so we probably
>> don't want them to be the default.
>>
>> That said, I would like to enable that kind of sugar while still taking
>> advantage of all the optimizations going on under the covers.  Can you get
>> it to work if you use `as[...]` instead of `map`?
>>
>> On Wed, Jun 1, 2016 at 11:59 AM, Richard Marscher <
>> rmarsc...@localytics.com> wrote:
>>
>>> Ah thanks, I missed seeing the PR for
>>> https://issues.apache.org/jira/browse/SPARK-15441. If the rows became
>>> null objects then I can implement methods that will map those back to
>>> results that align closer to the RDD interface.
>>>
>>> As a follow on, I'm curious about thoughts regarding enriching the
>>> Dataset join interface versus a package or users sugaring for themselves. I
>>> haven't considered the implications of what the optimizations datasets,
>>> tungsten, and/or bytecode gen can do now regarding joins so I may be
>>> missing a critical benefit there around say avoiding Options in favor of
>>> nulls. If nothing else, I guess Option doesn't have a first class Encoder
>>> or DataType yet and maybe for good reasons.
>>>
>>> I did find the RDD join interface elegant, though. In the ideal world an
>>> API comparable the following would be nice:
>>> https://gist.github.com/rmarsch/3ea78b3a9a8a0e83ce162ed947fcab06
>>>
>>>
>>> On Wed, Jun 1, 2016 at 1:42 PM, Michael Armbrust >> > wrote:
>>>
 Thanks for the feedback.  I think this will address at least some of
 the problems you are describing:
 https://github.com/apache/spark/pull/13425

 On Wed, Jun 1, 2016 at 9:58 AM, Richard Marscher <
 rmarsc...@localytics.com> wrote:

> Hi,
>
> I've been working on transitioning from RDD to Datasets in our
> codebase in anticipation of being able to leverage features of 2.0.
>
> I'm having a lot of difficulties with the impedance mismatches between
> how outer joins worked with RDD versus Dataset. The Dataset joins feel 
> like
> a big step backwards IMO. With RDD, leftOuterJoin would give you Option
> types of the results from the right side of the join. This follows
> idiomatic Scala avoiding nulls and was easy to work with.
>
> Now with Dataset there is only joinWith where you specify the join
> type, but it lost all the semantics of identifying missing data from outer
> joins. I can write some 

Re: Unable to set ContextClassLoader in spark shell

2016-06-06 Thread Marcelo Vanzin
On Mon, Jun 6, 2016 at 4:22 AM, shengzhixia  wrote:
> In my previous Java project I can change class loader without problem. Could
> I know why the above method couldn't change class loader in spark shell?
> Any way I can achieve it?

The spark-shell for Scala 2.10 will reset the context class loader
every time you run a statement. Not sure if the 2.11 behavior is the
same (don't see the code in Spark, but it might be in the Scala repl
code).

I'm not sure there is any way to work around it.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Specify node where driver should run

2016-06-06 Thread Saiph Kappa
How can I specify the node where application master should run in the yarn
conf? I haven't found any useful information regarding that.

Thanks.

On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler  wrote:

> In that mode, it will run on the application master, whichever node that
> is as specified in your yarn conf.
> On Jun 5, 2016 4:54 PM, "Saiph Kappa"  wrote:
>
>> Hi,
>>
>> In yarn-cluster mode, is there any way to specify on which node I want
>> the driver to run?
>>
>> Thanks.
>>
>


Re: Specify node where driver should run

2016-06-06 Thread Bryan Cutler
In that mode, it will run on the application master, whichever node that is
as specified in your yarn conf.
On Jun 5, 2016 4:54 PM, "Saiph Kappa"  wrote:

> Hi,
>
> In yarn-cluster mode, is there any way to specify on which node I want the
> driver to run?
>
> Thanks.
>


Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-06 Thread Alonso Isidoro Roman
Hi, just to update the thread, i have just submited a simple wordcount job
using yarn using this command:

[cloudera@quickstart simple-word-count]$ spark-submit --class
com.example.Hello --master yarn --deploy-mode cluster --driver-memory
1024Mb --executor-memory 1G --executor-cores 1
target/scala-2.10/test_2.10-1.0.jar

and the process was submited to the cluster and finalized fine, i can see
the correct output. Now i know that the previous process havent enough
resources. Now it is a matter of tuning the process...

Running free command outputs this:


[cloudera@quickstart simple-word-count]$ free
 total   used   free sharedbuffers cached
Mem:   806110466870441374060   3464   5796 484416
-/+ buffers/cache:61968321864272
Swap:  8388604 6875007701104

so, i can only use at least 1GB...


Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman


2016-06-06 12:03 GMT+02:00 Mich Talebzadeh :

> have you tried master local that should work. This works as a test
>
> ${SPARK_HOME}/bin/spark-submit \
>  --driver-memory 2G \
> --num-executors 1 \
> --executor-memory 2G \
> --master local[2] \
> --executor-cores 2 \
> --conf "spark.scheduler.mode=FAIR" \
> --conf
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps" \
> --jars
> /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
> --class
> "com.databricks.apps.twitter_classifier.${FILE_NAME}" \
> --conf "spark.ui.port=${SP}" \
> --conf "spark.kryoserializer.buffer.max=512" \
> ${JAR_FILE} \
> ${OUTPUT_DIRECTORY:-/tmp/tweets} \
> ${NUM_TWEETS_TO_COLLECT:-1} \
> ${OUTPUT_FILE_INTERVAL_IN_SECS:-10} \
> ${OUTPUT_FILE_PARTITIONS_EACH_INTERVAL:-1} \
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 6 June 2016 at 10:28, Alonso Isidoro Roman  wrote:
>
>> Hi guys, i finally understand that i cannot use sbt-pack to use
>> programmatically  the spark-streaming job as unix commands, i have to use
>> yarn or mesos  in order to run the jobs.
>>
>> I have some doubts, if i run the spark streaming jogs as yarn client
>> mode, i am receiving this exception:
>>
>> [cloudera@quickstart ~]$ spark-submit --class
>> example.spark.AmazonKafkaConnectorWithMongo --master yarn --deploy-mode
>> client --driver-memory 4g --executor-memory 2g --executor-cores 3
>> /home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
>> 192.168.1.35:9092 amazonRatingsTopic
>> java.lang.ClassNotFoundException:
>> example.spark.AmazonKafkaConnectorWithMongo
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>> But, if i use cluster mode, i have that is job is accepted.
>>
>> [cloudera@quickstart ~]$ spark-submit --class
>> example.spark.AmazonKafkaConnectorWithMongo --master yarn --deploy-mode
>> cluster --driver-memory 4g --executor-memory 2g --executor-cores 2
>> /home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
>> 192.168.1.35:9092 amazonRatingsTopic
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual 

subscribe

2016-06-06 Thread Kishorkumar Patil



Re: Dataset Outer Join vs RDD Outer Join

2016-06-06 Thread Richard Marscher
A quick unit test attempt didn't get far replacing map with as[], I'm only
working against 1.6.1 at the moment though, I was going to try 2.0 but I'm
having a hard time building a working spark-sql jar from source, the only
ones I've managed to make are intended for the full assembly fat jar.


Example of the error from calling joinWith as left_outer and then
.as[(Option[T], U]) where T and U are Int and Int.

[info] newinstance(class scala.Tuple2,decodeusingserializer(input[0,
StructType(StructField(_1,IntegerType,true),
StructField(_2,IntegerType,true))],scala.Option,true),decodeusingserializer(input[1,
StructType(StructField(_1,IntegerType,true),
StructField(_2,IntegerType,true))],scala.Option,true),false,ObjectType(class
scala.Tuple2),None)
[info] :- decodeusingserializer(input[0,
StructType(StructField(_1,IntegerType,true),
StructField(_2,IntegerType,true))],scala.Option,true)
[info] :  +- input[0, StructType(StructField(_1,IntegerType,true),
StructField(_2,IntegerType,true))]
[info] +- decodeusingserializer(input[1,
StructType(StructField(_1,IntegerType,true),
StructField(_2,IntegerType,true))],scala.Option,true)
[info]+- input[1, StructType(StructField(_1,IntegerType,true),
StructField(_2,IntegerType,true))]

Cause: java.util.concurrent.ExecutionException: java.lang.Exception: failed
to compile: org.codehaus.commons.compiler.CompileException: File
'generated.java', Line 32, Column 60: No applicable constructor/method
found for actual parameters "org.apache.spark.sql.catalyst.InternalRow";
candidates are: "public static java.nio.ByteBuffer
java.nio.ByteBuffer.wrap(byte[])", "public static java.nio.ByteBuffer
java.nio.ByteBuffer.wrap(byte[], int, int)"

The generated code is passing InternalRow objects into the ByteBuffer

Starting from two Datasets of types Dataset[(Int, Int)] with expression
$"left._1" === $"right._1". I'll have to spend some time getting a better
understanding of this analysis phase, but hopefully I can come up with
something.

On Wed, Jun 1, 2016 at 3:43 PM, Michael Armbrust 
wrote:

> Option should place nicely with encoders, but its always possible there
> are bugs.  I think those function signatures are slightly more expensive
> (one extra object allocation) and its not as java friendly so we probably
> don't want them to be the default.
>
> That said, I would like to enable that kind of sugar while still taking
> advantage of all the optimizations going on under the covers.  Can you get
> it to work if you use `as[...]` instead of `map`?
>
> On Wed, Jun 1, 2016 at 11:59 AM, Richard Marscher <
> rmarsc...@localytics.com> wrote:
>
>> Ah thanks, I missed seeing the PR for
>> https://issues.apache.org/jira/browse/SPARK-15441. If the rows became
>> null objects then I can implement methods that will map those back to
>> results that align closer to the RDD interface.
>>
>> As a follow on, I'm curious about thoughts regarding enriching the
>> Dataset join interface versus a package or users sugaring for themselves. I
>> haven't considered the implications of what the optimizations datasets,
>> tungsten, and/or bytecode gen can do now regarding joins so I may be
>> missing a critical benefit there around say avoiding Options in favor of
>> nulls. If nothing else, I guess Option doesn't have a first class Encoder
>> or DataType yet and maybe for good reasons.
>>
>> I did find the RDD join interface elegant, though. In the ideal world an
>> API comparable the following would be nice:
>> https://gist.github.com/rmarsch/3ea78b3a9a8a0e83ce162ed947fcab06
>>
>>
>> On Wed, Jun 1, 2016 at 1:42 PM, Michael Armbrust 
>> wrote:
>>
>>> Thanks for the feedback.  I think this will address at least some of the
>>> problems you are describing: https://github.com/apache/spark/pull/13425
>>>
>>> On Wed, Jun 1, 2016 at 9:58 AM, Richard Marscher <
>>> rmarsc...@localytics.com> wrote:
>>>
 Hi,

 I've been working on transitioning from RDD to Datasets in our codebase
 in anticipation of being able to leverage features of 2.0.

 I'm having a lot of difficulties with the impedance mismatches between
 how outer joins worked with RDD versus Dataset. The Dataset joins feel like
 a big step backwards IMO. With RDD, leftOuterJoin would give you Option
 types of the results from the right side of the join. This follows
 idiomatic Scala avoiding nulls and was easy to work with.

 Now with Dataset there is only joinWith where you specify the join
 type, but it lost all the semantics of identifying missing data from outer
 joins. I can write some enriched methods on Dataset with an implicit class
 to abstract messiness away if Dataset nulled out all mismatching data from
 an outer join, however the problem goes even further in that the values
 aren't always null. Integer, for example, defaults to -1 instead of null.
 Now it's completely ambiguous what data in the join was actually there
 

Logging from transformations in PySpark

2016-06-06 Thread Michael Ravits
Hi,

I'd like to send some performance metrics from some of the transformations
to StatsD.
I understood that I should create a new connection to StatsD from each
transformation which I'm afraid would harm performance.
I've also read that there is a workaround for this in Scala by defining an
object as transient.
My question is whether that's also possible in Python with PySpark?
Specifically I'd like to lazily initialize a transient object that will be
used for sending metrics to StatsD over a local socket connection.

Thanks,
Michael


Re: Spark SQL - Encoders - case class

2016-06-06 Thread Dave Maughan
Hi,

Thanks for the quick replies. I've tried those suggestions but Eclipse is
showing:

*Unable** to find encoder for type stored in a Dataset.  Primitive
types (Int, String, etc) and Product types (case classes) are supported by
importing sqlContext.implicits._  Support for serializing other types will
be added in future.*


Thanks

- Dave


Re: Spark SQL - Encoders - case class

2016-06-06 Thread Han JU
Hi,

I think encoders for case classes are already provided in spark. You'll
just need to import them.

val sql = new SQLContext(sc)
import sql.implicits._

And then do the cast to Dataset.

2016-06-06 14:13 GMT+02:00 Dave Maughan :

> Hi,
>
> I've figured out how to select data from a remote Hive instance and encode
> the DataFrame -> Dataset using a Java POJO class:
>
> TestHive.sql("select foo_bar as `fooBar` from table1"
> ).as(Encoders.bean(classOf[Table1])).show()
>
> However, I'm struggling to find out to do the equivalent in Scala if
> Table1 is a case class. Could someone please point me in the right
> direction?
>
> Thanks
> - Dave
>



-- 
*JU Han*

Software Engineer @ Teads.tv

+33 061960


Spark SQL - Encoders - case class

2016-06-06 Thread Dave Maughan
Hi,

I've figured out how to select data from a remote Hive instance and encode
the DataFrame -> Dataset using a Java POJO class:

TestHive.sql("select foo_bar as `fooBar` from table1"
).as(Encoders.bean(classOf[Table1])).show()

However, I'm struggling to find out to do the equivalent in Scala if Table1
is a case class. Could someone please point me in the right direction?

Thanks
- Dave


Re: How to modify collection inside a spark rdd foreach

2016-06-06 Thread Robineast
It's not that clear what you are trying to achieve - what type is myRDD and
where do k and v come from?

Anyway it seems you want to end up with a map or a dictionary which is what
PairRDD is for e.g.

val rdd = sc.makeRDD(Array("1","2","3"))
val pairRDD = rdd.map(el => (el.toInt, el))






-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-modify-collection-inside-a-spark-rdd-foreach-tp27088p27095.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Unable to set ContextClassLoader in spark shell

2016-06-06 Thread shengzhixia
Hello guys!

I am using spark shell which uses TranslatingClassLoader.

scala> Thread.currentThread().getContextClassLoader
res13: ClassLoader =
org.apache.spark.repl.SparkIMain$TranslatingClassLoader@23c767e6


For some reason I want to use another class loader, but when I do

val myclassloader = // create my own classloader
Thread.currentThread.setContextClassLoader(myclassloader)

The setContextClassLoader doesn't seem to work. I still get a
TranslatingClassLoader:

scala> Thread.currentThread().getContextClassLoader
res13: ClassLoader =
org.apache.spark.repl.SparkIMain$TranslatingClassLoader@23c767e6


In my previous Java project I can change class loader without problem. Could
I know why the above method couldn't change class loader in spark shell? 
Any way I can achieve it?

Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-set-ContextClassLoader-in-spark-shell-tp27094.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-06 Thread Mich Talebzadeh
have you tried master local that should work. This works as a test

${SPARK_HOME}/bin/spark-submit \
 --driver-memory 2G \
--num-executors 1 \
--executor-memory 2G \
--master local[2] \
--executor-cores 2 \
--conf "spark.scheduler.mode=FAIR" \
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" \
--jars
/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
--class
"com.databricks.apps.twitter_classifier.${FILE_NAME}" \
--conf "spark.ui.port=${SP}" \
--conf "spark.kryoserializer.buffer.max=512" \
${JAR_FILE} \
${OUTPUT_DIRECTORY:-/tmp/tweets} \
${NUM_TWEETS_TO_COLLECT:-1} \
${OUTPUT_FILE_INTERVAL_IN_SECS:-10} \
${OUTPUT_FILE_PARTITIONS_EACH_INTERVAL:-1} \


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 6 June 2016 at 10:28, Alonso Isidoro Roman  wrote:

> Hi guys, i finally understand that i cannot use sbt-pack to use
> programmatically  the spark-streaming job as unix commands, i have to use
> yarn or mesos  in order to run the jobs.
>
> I have some doubts, if i run the spark streaming jogs as yarn client mode,
> i am receiving this exception:
>
> [cloudera@quickstart ~]$ spark-submit --class
> example.spark.AmazonKafkaConnectorWithMongo --master yarn --deploy-mode
> client --driver-memory 4g --executor-memory 2g --executor-cores 3
> /home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
> 192.168.1.35:9092 amazonRatingsTopic
> java.lang.ClassNotFoundException:
> example.spark.AmazonKafkaConnectorWithMongo
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> But, if i use cluster mode, i have that is job is accepted.
>
> [cloudera@quickstart ~]$ spark-submit --class
> example.spark.AmazonKafkaConnectorWithMongo --master yarn --deploy-mode
> cluster --driver-memory 4g --executor-memory 2g --executor-cores 2
> /home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
> 192.168.1.35:9092 amazonRatingsTopic
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/06/06 11:16:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 16/06/06 11:16:46 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 16/06/06 11:16:46 INFO yarn.Client: Requesting a new application from
> cluster with 1 NodeManagers
> 16/06/06 11:16:46 INFO yarn.Client: Verifying our application has not
> requested more than the maximum memory capability of the cluster (8192 MB
> per container)
> 16/06/06 11:16:46 INFO yarn.Client: Will allocate AM container, with 4505
> MB memory including 409 MB overhead
> 16/06/06 11:16:46 INFO yarn.Client: Setting up container launch context
> for our AM
> 16/06/06 11:16:46 INFO yarn.Client: Setting up the launch environment for
> our AM container
> 16/06/06 11:16:46 INFO yarn.Client: Preparing resources for our AM
> container
> 16/06/06 11:16:47 WARN shortcircuit.DomainSocketFactory: The short-circuit
> local reads feature cannot be used because libhadoop cannot be loaded.
> 16/06/06 11:16:47 INFO yarn.Client: Uploading resource
> file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar
> ->
> 

Re: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-06 Thread Alonso Isidoro Roman
Hi guys, i finally understand that i cannot use sbt-pack to use
programmatically  the spark-streaming job as unix commands, i have to use
yarn or mesos  in order to run the jobs.

I have some doubts, if i run the spark streaming jogs as yarn client mode,
i am receiving this exception:

[cloudera@quickstart ~]$ spark-submit --class
example.spark.AmazonKafkaConnectorWithMongo --master yarn --deploy-mode
client --driver-memory 4g --executor-memory 2g --executor-cores 3
/home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
192.168.1.35:9092 amazonRatingsTopic
java.lang.ClassNotFoundException:
example.spark.AmazonKafkaConnectorWithMongo
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:175)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


But, if i use cluster mode, i have that is job is accepted.

[cloudera@quickstart ~]$ spark-submit --class
example.spark.AmazonKafkaConnectorWithMongo --master yarn --deploy-mode
cluster --driver-memory 4g --executor-memory 2g --executor-cores 2
/home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
192.168.1.35:9092 amazonRatingsTopic
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/06/06 11:16:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
16/06/06 11:16:46 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
16/06/06 11:16:46 INFO yarn.Client: Requesting a new application from
cluster with 1 NodeManagers
16/06/06 11:16:46 INFO yarn.Client: Verifying our application has not
requested more than the maximum memory capability of the cluster (8192 MB
per container)
16/06/06 11:16:46 INFO yarn.Client: Will allocate AM container, with 4505
MB memory including 409 MB overhead
16/06/06 11:16:46 INFO yarn.Client: Setting up container launch context for
our AM
16/06/06 11:16:46 INFO yarn.Client: Setting up the launch environment for
our AM container
16/06/06 11:16:46 INFO yarn.Client: Preparing resources for our AM container
16/06/06 11:16:47 WARN shortcircuit.DomainSocketFactory: The short-circuit
local reads feature cannot be used because libhadoop cannot be loaded.
16/06/06 11:16:47 INFO yarn.Client: Uploading resource
file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar
->
hdfs://quickstart.cloudera:8020/user/cloudera/.sparkStaging/application_1465201086091_0006/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar
16/06/06 11:16:47 INFO yarn.Client: Uploading resource
file:/home/cloudera/awesome-recommendation-engine/target/scala-2.10/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
->
hdfs://quickstart.cloudera:8020/user/cloudera/.sparkStaging/application_1465201086091_0006/my-recommendation-spark-engine_2.10-1.0-SNAPSHOT.jar
16/06/06 11:16:47 INFO yarn.Client: Uploading resource
file:/tmp/spark-8e5fe800-bed2-4173-bb11-d47b3ab3b621/__spark_conf__5840282197389631291.zip
->
hdfs://quickstart.cloudera:8020/user/cloudera/.sparkStaging/application_1465201086091_0006/__spark_conf__5840282197389631291.zip
16/06/06 11:16:47 INFO spark.SecurityManager: Changing view acls to:
cloudera
16/06/06 11:16:47 INFO spark.SecurityManager: Changing modify acls to:
cloudera
16/06/06 11:16:47 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(cloudera); users with modify permissions: Set(cloudera)
16/06/06 11:16:47 INFO yarn.Client: Submitting application 6 to
ResourceManager
16/06/06 11:16:48 INFO impl.YarnClientImpl: Submitted application
application_1465201086091_0006
16/06/06 11:16:49 INFO yarn.Client: Application report for
application_1465201086091_0006 (state: ACCEPTED)
16/06/06 11:16:49 INFO yarn.Client:

Re: Switching broadcast mechanism from torrrent

2016-06-06 Thread Daniel Haviv
Hi,
I've set  spark.broadcast.factory to
org.apache.spark.broadcast.HttpBroadcastFactory and it indeed resolve my
issue.

I'm creating a dataframe which creates a broadcast variable internally and
then fails due to the torrent broadcast with the following stacktrace:
Caused by: org.apache.spark.SparkException: Failed to get
broadcast_3_piece0 of broadcast_3
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:138)
at scala.Option.getOrElse(Option.scala:120)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:137)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org
$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:175)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220)

I'm using spark 1.6.0 on CDH 5.7

Thanks,
Daniel


On Wed, Jun 1, 2016 at 5:52 PM, Ted Yu  wrote:

> I found spark.broadcast.blockSize but no parameter to switch broadcast
> method.
>
> Can you describe the issues with torrent broadcast in more detail ?
>
> Which version of Spark are you using ?
>
> Thanks
>
> On Wed, Jun 1, 2016 at 7:48 AM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
>
>> Hi,
>> Our application is failing due to issues with the torrent broadcast, is
>> there a way to switch to another broadcast method ?
>>
>> Thank you.
>> Daniel
>>
>
>


Re: Fw: Basic question on using one's own classes in the Scala app

2016-06-06 Thread Marco Mistroni
HI Ashok
  this is not really a spark-related question so i would not use this
mailing list.
Anyway, my 2 cents here
as outlined by earlier replies, if the class you are referencing is in a
different jar, at compile time you will need to add that dependency to your
build.sbt,
I'd personally leave alone $CLASSPATH...

AT RUN TIME, you have two options:
1 -  as suggested by Ted, when yo u launch your app via spark-submit you
can use '--jars utilities-assembly-0.1-SNAPSHOT.jar' to pass the jar.
2 - Use sbt assembly plugin to package your classes and jars into a 'fat
jar', and then at runtime all you  need to do is to do

 spark-submit   --class   

I'd personally go for 1   as it is the easiest option. (FYI for 2  you
might encounter situations where you have dependencies referring to same
classes, adn that will require you to define an assemblyMergeStrategy)

hth




On Mon, Jun 6, 2016 at 8:52 AM, Ashok Kumar 
wrote:

> Anyone can help me with this please
>
>
> On Sunday, 5 June 2016, 11:06, Ashok Kumar  wrote:
>
>
> Hi all,
>
> Appreciate any advice on this. It is about scala
>
> I have created a very basic Utilities.scala that contains a test class and
> method. I intend to add my own classes and methods as I expand and make
> references to these classes and methods in my other apps
>
> class getCheckpointDirectory {
>   def CheckpointDirectory (ProgramName: String) : String  = {
>  var hdfsDir = "hdfs://host:9000/user/user/checkpoint/"+ProgramName
>  return hdfsDir
>   }
> }
> I have used sbt to create a jar file for it. It is created as a jar file
>
> utilities-assembly-0.1-SNAPSHOT.jar
>
> Now I want to make a call to that method CheckpointDirectory in my app
> code myapp.dcala to return the value for hdfsDir
>
>val ProgramName = this.getClass.getSimpleName.trim
>val getCheckpointDirectory =  new getCheckpointDirectory
>val hdfsDir = getCheckpointDirectory.CheckpointDirectory(ProgramName)
>
> However, I am getting a compilation error as expected
>
> not found: type getCheckpointDirectory
> [error] val getCheckpointDirectory =  new getCheckpointDirectory
> [error]   ^
> [error] one error found
> [error] (compile:compileIncremental) Compilation failed
>
> So a basic question, in order for compilation to work do I need to create
> a package for my jar file or add dependency like the following I do in sbt
>
> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1"
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.1"
> libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.1"
>
>
> Or add the jar file to $CLASSPATH?
>
> Any advise will be appreciated.
>
> Thanks
>
>
>
>
>
>
>


Re: Performance of Spark/MapReduce

2016-06-06 Thread Sean Owen
I don't think that quote is true in general. Given a map-only task, or a
map+shuffle+reduce, I'd expect MapReduce to be the same speed or a little
faster. It is the simpler, lower-level, narrower mechanism. It's all
limited by how fast you can read/write data and execute the user code.

There's a big difference if you're executing a many-stage pipeline where a
chain of M/R jobs would be writing back to disk each time, but a Spark job
could stay in memory. This is most of the source of that quote.

I think the argument for Spark is 95% that it's a higher-level API. Writing
M/R takes 10s of times more code. But, I think people were already using
things like Crunch on M/R before Spark anyway.

Spark still adds value with things like a DataFrame API; if you're doing
work that fits its constraints, then it can optimize more under the hood.
M/R is just a job scheduler for user code.

On Mon, Jun 6, 2016 at 4:12 AM, Deepak Goel  wrote:

>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
> Sorry about that (The question might still be general as I am new to
> Spark).
>
> My question is:
>
> Spark claims to be 10x times faster on disk and 100x times faster in
> memory as compared to Mapreduce. Is there any benchmark paper for the same
> which sketches out the details? Is the benchmark true for all
> applications/platforms or for a particular platform?
>
> Also has someone made a study as to what are the changes in Spark as
> compared to Mapreduce which causes the performance improvement.
>
> For example:
>
> Change A in Spark v/s Mapreduce (Multiple Spill files in Mapper) > %
> Reduction in the number of instructions > 2X times the performance
> benefit  --- > Any disadvantages like availability or conditions that the
> system should have multiple Disk I/O Channels
>
> Change B in Spark v/s Mapreduce (Difference in data consolidation in
> Reducer) --- % Reduction in the number of instructions --> 1.5X times the
> performance benefit > Any disadvantages like availability
>
> And so on...
>
> Also has a cost analysis been included in such a kind of study. Any case
> studies?
>
> Deepak
>
>
>
>
>
>
>
>
>
>
>
> ===
>
> Two questions:
>
> 1. Is this related to the thread in any way? If not, please start a new
> one, otherwise you confuse people like myself.
>
> 2. This question is so general, do you understand the similarities and
> differences between spark and mapreduce? Learn first, then ask questions.
>
> Spark can map-reduce.
>
> Sent from my iPhone
>
> On Jun 5, 2016, at 4:37 PM, Deepak Goel  wrote:
>
> Hello
>
> Sorry, I am new to Spark.
>
> Spark claims it can do all that what MapReduce can do (and more!) but 10X
> times faster on disk, and 100X faster in memory. Why would then I use
> Mapreduce at all?
>
> Thanks
> Deepak
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
>
>
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>


Fw: Basic question on using one's own classes in the Scala app

2016-06-06 Thread Ashok Kumar
Anyone can help me with this please

 On Sunday, 5 June 2016, 11:06, Ashok Kumar  wrote:
 

 Hi all,
Appreciate any advice on this. It is about scala
I have created a very basic Utilities.scala that contains a test class and 
method. I intend to add my own classes and methods as I expand and make 
references to these classes and methods in my other apps
class getCheckpointDirectory {  def CheckpointDirectory (ProgramName: String) : 
String  = {     var hdfsDir = 
"hdfs://host:9000/user/user/checkpoint/"+ProgramName     return hdfsDir  }}I 
have used sbt to create a jar file for it. It is created as a jar file
utilities-assembly-0.1-SNAPSHOT.jar

Now I want to make a call to that method CheckpointDirectory in my app code 
myapp.dcala to return the value for hdfsDir
   val ProgramName = this.getClass.getSimpleName.trim   val 
getCheckpointDirectory =  new getCheckpointDirectory   val hdfsDir = 
getCheckpointDirectory.CheckpointDirectory(ProgramName)
However, I am getting a compilation error as expected
not found: type getCheckpointDirectory[error]     val getCheckpointDirectory =  
new getCheckpointDirectory[error]                                       
^[error] one error found[error] (compile:compileIncremental) Compilation failed
So a basic question, in order for compilation to work do I need to create a 
package for my jar file or add dependency like the following I do in sbt
libraryDependencies += "org.apache.spark" %% "spark-core" % 
"1.5.1"libraryDependencies += "org.apache.spark" %% "spark-sql" % 
"1.5.1"libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.1"

Or add the jar file to $CLASSPATH?
Any advise will be appreciated.
Thanks