Spark 3.41 with Java 11 performance on k8s serverless/autopilot
Hi, I would like to share experience on spark 3.4.1 running on k8s autopilot or some refer to it as serverless. My current experience is on Google GKE autopilot <https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview>. So essentially you specify the name and region and CSP takes care of the rest. FYI, I am running Java 11 and spark 3.4.1 on the host submitting spark-submit. The docker file is also built on java 11, Spark 3.4,1 and Pyspark The tag explains it spark-py:3.4.1-scala_2.12-11-jre-slim-buster-java11PlusPackages The problem I notice is that cluster starts with e2-medium node which is 4GB NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS spark-on-gke europe-west2 1.27.2-gke.1200 34.147.184.xxx e2-medium 1.27.2-gke.1200 1 RUNNING Meaning that the driver starts with that configuration and it takes at times more than three minutes for the driver to go into RUNNING state. In contrast, this does not have such problems with spark 3.1.1 and Java 8 both at the host and the docker file. Any reason why this is happening, taking into account that Java 11 and Spark 3.4.1 consume more resources. Essentially is autopilot a good fit for spark? The spark-submit is shown below spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name $APPNAME \ --py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \ --conf spark.kubernetes.namespace=$NAMESPACE \ --conf spark.network.timeout=300 \ --conf spark.kubernetes.allocation.batch.size=3 \ --conf spark.kubernetes.allocation.batch.delay=1 \ --conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \ --conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \ --conf spark.kubernetes.driver.pod.name=$APPNAME \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \ --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.shuffleTracking.enabled=true \ --conf spark.dynamicAllocation.shuffleTracking.timeout=20s \ --conf spark.dynamicAllocation.executorIdleTimeout=30s \ --conf spark.dynamicAllocation.cachedExecutorIdleTimeout=40s \ --conf spark.dynamicAllocation.minExecutors=0 \ --conf spark.dynamicAllocation.maxExecutors=20 \ $CODE_DIRECTORY_CLOUD/${APPLICATION} Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
Re: Docker images for Spark 3.1.1 and Spark 3.1.2 with Java 11 and Java 8 from docker hub
Added dockers for Spark 3.2.1 with default11-jre-slim-buster for spark and spark-py HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sun, 20 Feb 2022 at 13:50, Mich Talebzadeh wrote: > I have loaded docker files into my docker repository on docker hub and it > is public. > > > These are built on Spark 3.1.2 OR 3.1.1, with Scala 2.12 and with Java 11 > OR Java 8 on OS jre-slim-buster. The ones built on 3.1.1 with Java 8 > should work with GCP > > > No additional packages are added to PySpark in docker. > > > They can be downloaded from here > <https://hub.docker.com/repository/docker/michtalebzadeh/spark_dockerfiles/tags?page=1=last_updated> > > > How to download. The instructions are there. > > > Example: > > > docker pull > michtalebzadeh/spark_dockerfiles:spark-py-3.1.1-scala_2.12-8-jre-slim-buster > > > Let me know if any issues > > > HTH > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >
Docker images for Spark 3.1.1 and Spark 3.1.2 with Java 11 and Java 8 from docker hub
I have loaded docker files into my docker repository on docker hub and it is public. These are built on Spark 3.1.2 OR 3.1.1, with Scala 2.12 and with Java 11 OR Java 8 on OS jre-slim-buster. The ones built on 3.1.1 with Java 8 should work with GCP No additional packages are added to PySpark in docker. They can be downloaded from here <https://hub.docker.com/repository/docker/michtalebzadeh/spark_dockerfiles/tags?page=1=last_updated> How to download. The instructions are there. Example: docker pull michtalebzadeh/spark_dockerfiles:spark-py-3.1.1-scala_2.12-8-jre-slim-buster Let me know if any issues HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
Re: Submitting insert query from beeline failing on executor server with java 11
HI Jungtaek Lim , Thanks for the response, so we have no option only to wait till hadoop officially supports java 11. Thanks and regards, kaki mahesh raja -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Submitting insert query from beeline failing on executor server with java 11
Hmm... I read the page again, and it looks like we are in a gray area. Hadoop community supports JDK 11 starting from Hadoop 3.3, while we haven't reached adding Hadoop 3.3 as a dependency. It may not make a real issue on runtime with Hadoop 3.x as Spark is using a part of Hadoop (client layer), but worth to know in any way that it's not in official support from the Hadoop community. On Wed, Mar 17, 2021 at 6:54 AM Jungtaek Lim wrote: > Hadoop 2.x doesn't support JDK 11. See Hadoop Java version compatibility > with JDK: > > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions > > That said, you'll need to use Spark 3.x with Hadoop 3.1 profile to make > Spark work with JDK 11. > > On Tue, Mar 16, 2021 at 10:06 PM Sean Owen wrote: > >> That looks like you didn't compile with Java 11 actually. How did you try >> to do so? >> >> On Tue, Mar 16, 2021, 7:50 AM kaki mahesh raja < >> kaki.mahesh_r...@nokia.com> wrote: >> >>> HI All, >>> >>> We have compiled spark with java 11 ("11.0.9.1") and when testing the >>> thrift >>> server we are seeing that insert query from operator using beeline >>> failing >>> with the below error. >>> >>> {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z", >>> "timezone":"UTC", "log":"Uncaught exception in thread >>> blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044 >>> ,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK], >>> DatanodeInfoWithStorage[10.75.47.158:1044 >>> ,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"} >>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner >>> sun.nio.ch.DirectBuffer.cleaner()' >>> at >>> >>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40) >>> ~[hadoop-common-2.10.1.jar:?] >>> at >>> >>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780) >>> ~[hadoop-common-2.10.1.jar:?] >>> at >>> >>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322) >>> ~[hadoop-common-2.10.1.jar:?] >>> at java.io.FilterInputStream.close(FilterInputStream.java:180) >>> ~[?:?] >>> at >>> org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003) >>> ~[hadoop-hdfs-client-2.10.1.jar:?] >>> at >>> org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845) >>> ~[hadoop-hdfs-client-2.10.1.jar:?] >>> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840) >>> ~[hadoop-hdfs-client-2.10.1.jar:?] >>> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z", >>> "timezone":"UTC", "log":"unwrapping token of length:54"} >>> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z", >>> "timezone":"UTC", "log":"IPC Client (1437736861) connection to >>> vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"} >>> >>> Any inputs on how to fix this issue would be helpful for us. >>> >>> Thanks and Regards, >>> kaki mahesh raja >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>> >>> - >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>>
Re: Submitting insert query from beeline failing on executor server with java 11
Hadoop 2.x doesn't support JDK 11. See Hadoop Java version compatibility with JDK: https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions That said, you'll need to use Spark 3.x with Hadoop 3.1 profile to make Spark work with JDK 11. On Tue, Mar 16, 2021 at 10:06 PM Sean Owen wrote: > That looks like you didn't compile with Java 11 actually. How did you try > to do so? > > On Tue, Mar 16, 2021, 7:50 AM kaki mahesh raja > wrote: > >> HI All, >> >> We have compiled spark with java 11 ("11.0.9.1") and when testing the >> thrift >> server we are seeing that insert query from operator using beeline >> failing >> with the below error. >> >> {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z", >> "timezone":"UTC", "log":"Uncaught exception in thread >> blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044 >> ,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK], >> DatanodeInfoWithStorage[10.75.47.158:1044 >> ,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"} >> java.lang.NoSuchMethodError: 'sun.misc.Cleaner >> sun.nio.ch.DirectBuffer.cleaner()' >> at >> >> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40) >> ~[hadoop-common-2.10.1.jar:?] >> at >> >> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780) >> ~[hadoop-common-2.10.1.jar:?] >> at >> >> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322) >> ~[hadoop-common-2.10.1.jar:?] >> at java.io.FilterInputStream.close(FilterInputStream.java:180) >> ~[?:?] >> at >> org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003) >> ~[hadoop-hdfs-client-2.10.1.jar:?] >> at >> org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845) >> ~[hadoop-hdfs-client-2.10.1.jar:?] >> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840) >> ~[hadoop-hdfs-client-2.10.1.jar:?] >> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z", >> "timezone":"UTC", "log":"unwrapping token of length:54"} >> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z", >> "timezone":"UTC", "log":"IPC Client (1437736861) connection to >> vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"} >> >> Any inputs on how to fix this issue would be helpful for us. >> >> Thanks and Regards, >> kaki mahesh raja >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Submitting insert query from beeline failing on executor server with java 11
That looks like you didn't compile with Java 11 actually. How did you try to do so? On Tue, Mar 16, 2021, 7:50 AM kaki mahesh raja wrote: > HI All, > > We have compiled spark with java 11 ("11.0.9.1") and when testing the > thrift > server we are seeing that insert query from operator using beeline failing > with the below error. > > {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z", > "timezone":"UTC", "log":"Uncaught exception in thread > blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044 > ,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK], > DatanodeInfoWithStorage[10.75.47.158:1044 > ,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"} > java.lang.NoSuchMethodError: 'sun.misc.Cleaner > sun.nio.ch.DirectBuffer.cleaner()' > at > > org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40) > ~[hadoop-common-2.10.1.jar:?] > at > > org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780) > ~[hadoop-common-2.10.1.jar:?] > at > > org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322) > ~[hadoop-common-2.10.1.jar:?] > at java.io.FilterInputStream.close(FilterInputStream.java:180) > ~[?:?] > at > org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003) > ~[hadoop-hdfs-client-2.10.1.jar:?] > at > org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845) > ~[hadoop-hdfs-client-2.10.1.jar:?] > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840) > ~[hadoop-hdfs-client-2.10.1.jar:?] > {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z", > "timezone":"UTC", "log":"unwrapping token of length:54"} > {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z", > "timezone":"UTC", "log":"IPC Client (1437736861) connection to > vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"} > > Any inputs on how to fix this issue would be helpful for us. > > Thanks and Regards, > kaki mahesh raja > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Submitting insert query from beeline failing on executor server with java 11
HI All, We have compiled spark with java 11 ("11.0.9.1") and when testing the thrift server we are seeing that insert query from operator using beeline failing with the below error. {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z", "timezone":"UTC", "log":"Uncaught exception in thread blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK], DatanodeInfoWithStorage[10.75.47.158:1044,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"} java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()' at org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40) ~[hadoop-common-2.10.1.jar:?] at org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780) ~[hadoop-common-2.10.1.jar:?] at org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322) ~[hadoop-common-2.10.1.jar:?] at java.io.FilterInputStream.close(FilterInputStream.java:180) ~[?:?] at org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003) ~[hadoop-hdfs-client-2.10.1.jar:?] at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845) ~[hadoop-hdfs-client-2.10.1.jar:?] at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840) ~[hadoop-hdfs-client-2.10.1.jar:?] {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z", "timezone":"UTC", "log":"unwrapping token of length:54"} {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z", "timezone":"UTC", "log":"IPC Client (1437736861) connection to vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"} Any inputs on how to fix this issue would be helpful for us. Thanks and Regards, kaki mahesh raja -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark 3.0.1 giving warning while running with Java 11
Sure Sean. Thanks for confirmation. On Fri, 15 Jan 2021, 10:57 Sean Owen, wrote: > You can ignore that. Spark 3.x works with Java 11 but it will generate > some warnings that are safe to disregard. > > On Thu, Jan 14, 2021 at 11:26 PM Sachit Murarka > wrote: > >> Hi All, >> >> Getting warning while running spark3.0.1 with Java11 . >> >> >> WARNING: An illegal reflective access operation has occurred >> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform ( >> file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor >> java.nio.DirectByteBuffer(long,int) >> WARNING: Please consider reporting this to the maintainers of >> org.apache.spark.unsafe.Platform >> WARNING: Use --illegal-access=warn to enable warnings of further illegal >> reflective access operations >> WARNING: All illegal access operations will be denied in a future release >> >> >> >> Kind Regards, >> Sachit Murarka >> >
Re: Spark 3.0.1 giving warning while running with Java 11
You can ignore that. Spark 3.x works with Java 11 but it will generate some warnings that are safe to disregard. On Thu, Jan 14, 2021 at 11:26 PM Sachit Murarka wrote: > Hi All, > > Getting warning while running spark3.0.1 with Java11 . > > > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform ( > file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor > java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > > > Kind Regards, > Sachit Murarka >
Spark 3.0.1 giving warning while running with Java 11
Hi All, Getting warning while running spark3.0.1 with Java11 . WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform ( file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Kind Regards, Sachit Murarka
11
发自网易邮箱大师
Re: Spark Compatibility with Java 11
Thanks a lot. On Tue, Jul 14, 2020 at 12:51 PM Prashant Sharma wrote: > Hi Ankur, > > Java 11 support was added in Spark 3.0. > https://issues.apache.org/jira/browse/SPARK-24417 > > Thanks, > > > On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal > wrote: > >> Hi, >> >> I am using Spark 2.X and need to execute Java 11 .Its not able to execute >> Java 11 using Spark 2.X. >> >> Is there any way we can use Java 11 with Spark2.X? >> >> Has this issue been resolved in Spark 3.0 ? >> >> >> -- >> Regards >> Ankur Mittal >> >> -- Regards Ankur Mittal *+91-8447899504*
Re: Spark Compatibility with Java 11
Hi Ankur, Java 11 support was added in Spark 3.0. https://issues.apache.org/jira/browse/SPARK-24417 Thanks, On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal wrote: > Hi, > > I am using Spark 2.X and need to execute Java 11 .Its not able to execute > Java 11 using Spark 2.X. > > Is there any way we can use Java 11 with Spark2.X? > > Has this issue been resolved in Spark 3.0 ? > > > -- > Regards > Ankur Mittal > >
Spark Compatibility with Java 11
Hi, I am using Spark 2.X and need to execute Java 11 .Its not able to execute Java 11 using Spark 2.X. Is there any way we can use Java 11 with Spark2.X? Has this issue been resolved in Spark 3.0 ? -- Regards Ankur Mittal
Re: Java 11 support in Spark 2.5
>From this >(http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27966), > looks like there is no confirmation yet if at all Spark 2.5 would have JDK 11 >support. Spark 3 would most likely be out soon (tentatively this quarter as per mailing list). Spark 3 is going to have JDK 11 support. From: Sinha, Breeta (Nokia - IN/Bangalore) Sent: Thursday, January 2, 2020 12:48 PM To: user@spark.apache.org Cc: Rao, Abhishek (Nokia - IN/Bangalore) ; Imandi, Srinivas (Nokia - IN/Bangalore) Subject: Java 11 support in Spark 2.5 Hi All, Wanted to know if Java 11 support is added in Spark 2.5. If so, what is the expected timeline for Spark 2.5 release? Kind Regards, Breeta Sinha
Java 11 support in Spark 2.5
Hi All, Wanted to know if Java 11 support is added in Spark 2.5. If so, what is the expected timeline for Spark 2.5 release? Kind Regards, Breeta Sinha
Re: can Spark 2.4 work on JDK 11?
Not officially. We have seen problem with JDK 10 as well. It will be great if you or someone would like to contribute to get it to work.. From: kant kodali Sent: Tuesday, September 25, 2018 2:31 PM To: user @spark Subject: can Spark 2.4 work on JDK 11? Hi All, can Spark 2.4 work on JDK 11? I feel like there are lot of features that are added in JDK 9, 10, 11 that can make deployment process a whole lot better and of course some more syntax sugar similar to Scala. Thanks!
can Spark 2.4 work on JDK 11?
Hi All, can Spark 2.4 work on JDK 11? I feel like there are lot of features that are added in JDK 9, 10, 11 that can make deployment process a whole lot better and of course some more syntax sugar similar to Scala. Thanks!
ApacheCon CFP closing soon (11 February)
Hello, fellow Apache enthusiast. Thanks for your participation, and interest in, the projects of the Apache Software Foundation. I wanted to remind you that the Call For Papers (CFP) for ApacheCon North America, and Apache: Big Data North America, closes in less than a month. If you've been putting it off because there was lots of time left, it's time to dig for that inspiration and get those talk proposals in. It's also time to discuss with your developer and user community whether there's a track of talks that you might want to propose, so that you have more complete coverage of your project than a talk or two. We're looking for talks directly, and indirectly, related to projects at the Apache Software Foundation. These can be anything from in-depth technical discussions of the projects you work with, to talks about community, documentation, legal issues, marketing, and so on. We're also very interested in talks about projects and services built on top of Apache projects, and case studies of how you use Apache projects to solve real-world problems. We are particularly interested in presentations from Apache projects either in the Incubator, or recently graduated. ApacheCon is where people come to find out what technology they'll be using this time next year. Important URLs are: To submit a talk for Apache: Big Data - http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp To submit a talk for ApacheCon - http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp To register for Apache: Big Data - http://events.linuxfoundation.org/events/apache-big-data-north-america/attend/register- To register for ApacheCon - http://events.linuxfoundation.org/events/apachecon-north-america/attend/register- Early Bird registration rates end March 12th, but if you're a committer on an Apache project, you get the low committer rate, which is less than half of the early bird rate! For further updated about ApacheCon, follow us on Twitter, @ApacheCon, or drop by our IRC channel, #apachecon on the Freenode IRC network. Or contact me - rbo...@apache.org - with any questions or concerns. Thanks! Rich Bowen, VP Conferences, Apache Software Foundation -- (You've received this email because you're on a dev@ or users@ mailing list of an Apache Software Foundation project. For subscription and unsubscription information, consult the headers of this email message, as this varies from one list to another.) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Schedule lunchtime today for a free webinar IoT data ingestion in Spark Streaming using Kaa 11 a.m. PDT (2 p.m. EDT)
Hi there! If you missed our webinar on IoT data ingestion in Spark with KaaIoT, see the video and slides here: http://goo.gl/VMyQ1M We recorded our webinar on “IoT data ingestion in Spark Streaming using Kaa” for those who couldn’t see it live or who would like to refresh what they have learned. During the webinar, we explained and illustrated how Kaa and Spark can be effectively used together to address the challenges of IoT data gathering and analysis. In this video, you will find highly crystallized, practical instruction on setting up your own stream analytics solution with Kaa and Spark. Best wishes, Oleh Rozvadovskyy CyberVision Inc - Вихідне повідомлення - Від: Oleh Rozvadovskyy orozvadovs...@cybervisiontech.com Кому: user@spark.apache.org Надіслано: Четвер, 23 Липень 2015 р 17:48:11 Тема: Schedule lunchtime today for a free webinar IoT data ingestion in Spark Streaming using Kaa 11 a.m. PDT (2 p.m. EDT) Hi there! Only couple of hours left to our first webinar on IoT data ingestion in Spark Streaming using Kaa . During the webinar we will build a solution that ingests real-time data from Intel Edison into Apache Spark for stream processing. This solution includes a client, middleware, and analytics components. All of these software components are 100% open-source, therefore, the solution described in this tutorial can be used as a prototype for even a commercial product. Those, who are interested, please feel free to sign up here . Best wishes, Oleh Rozvadovskyy CyberVision Inc.
Schedule lunchtime today for a free webinar IoT data ingestion in Spark Streaming using Kaa 11 a.m. PDT (2 p.m. EDT)
Hi there! Only couple of hours left to our first webinar on* IoT data ingestion in Spark Streaming using Kaa*. During the webinar we will build a solution that ingests real-time data from Intel Edison into Apache Spark for stream processing. This solution includes a client, middleware, and analytics components. All of these software components are 100% open-source, therefore, the solution described in this tutorial can be used as a prototype for even a commercial product. Those, who are interested, please feel free to sign up here https://goo.gl/rgWuj6. Best wishes, Oleh Rozvadovskyy CyberVision Inc.
Re: Guava 11 dependency issue in Spark 1.2.0
I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893
Re: Guava 11 dependency issue in Spark 1.2.0
Actually there is already someone on Hadoop-Common-Dev taking care of removing the old Guava dependency http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser https://issues.apache.org/jira/browse/HADOOP-11470 *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Jan 19, 2015 at 4:03 PM, Romi Kuntsman r...@totango.com wrote: I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData
Re: Guava 11 dependency issue in Spark 1.2.0
Please see this thread: http://search-hadoop.com/m/LgpTk2aVYgr/Hadoop+guava+upgradesubj=Re+Time+to+address+the+Guava+version+problem On Jan 19, 2015, at 6:03 AM, Romi Kuntsman r...@totango.com wrote: I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? Romi Kuntsman, Big Data Engineer http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775
Re: Guava 11 dependency issue in Spark 1.2.0
-dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) at com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) While looking into this I found out that Guava was downgraded to version 11 in this PR. https://github.com/apache/spark/pull/1610 In this PR OpenHashSet.scala:261 line hashInt has been changed to hashLong. But when I actually run my app, java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt error occurs, which is understandable because hashInt is not available before Guava 12. So, I''m wondering why this occurs? Cheers -- Niranda Perera
Re: Guava 11 dependency issue in Spark 1.2.0
Hi Sean, My mistake, Guava 11 dependency came from the hadoop-commons indeed. I'm running the following simple app in spark 1.2.0 standalone local cluster (2 workers) with Hadoop 1.2.1 public class AvroSparkTest { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf() .setMaster(spark://niranda-ThinkPad-T540p:7077) //(local[2]) .setAppName(avro-spark-test); JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaSQLContext sqlContext = new JavaSQLContext(sparkContext); JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext, /home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro); episodes.printSchema(); episodes.registerTempTable(avroTable); ListRow result = sqlContext.sql(SELECT * FROM avroTable).collect(); for (Row row : result) { System.out.println(row.toString()); } } } As you pointed out, this error occurs while adding the hadoop dependency. this runs without a problem when the hadoop dependency is removed and the master is set to local[]. Cheers On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote: -dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) at com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner
Re: Guava 11 dependency issue in Spark 1.2.0
Oh, are you actually bundling Hadoop in your app? that may be the problem. If you're using stand-alone mode, why include Hadoop? In any event, Spark and Hadoop are intended to be 'provided' dependencies in the app you send to spark-submit. On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, My mistake, Guava 11 dependency came from the hadoop-commons indeed. I'm running the following simple app in spark 1.2.0 standalone local cluster (2 workers) with Hadoop 1.2.1 public class AvroSparkTest { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf() .setMaster(spark://niranda-ThinkPad-T540p:7077) //(local[2]) .setAppName(avro-spark-test); JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaSQLContext sqlContext = new JavaSQLContext(sparkContext); JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext, /home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro); episodes.printSchema(); episodes.registerTempTable(avroTable); ListRow result = sqlContext.sql(SELECT * FROM avroTable).collect(); for (Row row : result) { System.out.println(row.toString()); } } } As you pointed out, this error occurs while adding the hadoop dependency. this runs without a problem when the hadoop dependency is removed and the master is set to local[]. Cheers On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote: -dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at com.databricks.spark.avro.AvroRelation.buildScan$lzycompute
Guava 11 dependency issue in Spark 1.2.0
Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) at com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) While looking into this I found out that Guava was downgraded to version 11 in this PR. https://github.com/apache/spark/pull/1610 In this PR OpenHashSet.scala:261 line hashInt has been changed to hashLong. But when I actually run my app, java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt error occurs, which is understandable because hashInt is not available before Guava 12. So, I''m wondering why this occurs? Cheers -- Niranda Perera
Hive 11 / CDH 4.6/ Spark 0.9.1 dilemmna
I posted this in cdh-user mailing list yesterday and think this should have been the right audience for this: = Hi All, Not sure if anyone else faced this same issue or not. We installed CDH 4.6 that uses Hive 0.10. And we have Spark 0.9.1 that comes with Hive 11. Now our hive jobs that work on CDH, fail in Shark. Anyone else facing same issues and any work-arounds ? Can we re-compile shark 0.9.1 with hive 10 or compile hive 11 on CDH 4.6 ? Thanks, Anurag Tangri
Re: Hive 11 / CDH 4.6/ Spark 0.9.1 dilemmna
I haven't tried any of this, mind you, but my guess is that your options are, from least painful and most likely to work onwards, are: - Get Spark / Shark to compile against Hive 0.10 - Shade Hive 0.11 into Spark - Update to CDH5.0+ I don't think there will be more updated releases of Shark or Spark-on-CDH4, so you may want to be moving forward anyway. On Thu, Aug 7, 2014 at 12:46 AM, Anurag Tangri atan...@groupon.com wrote: I posted this in cdh-user mailing list yesterday and think this should have been the right audience for this: = Hi All, Not sure if anyone else faced this same issue or not. We installed CDH 4.6 that uses Hive 0.10. And we have Spark 0.9.1 that comes with Hive 11. Now our hive jobs that work on CDH, fail in Shark. Anyone else facing same issues and any work-arounds ? Can we re-compile shark 0.9.1 with hive 10 or compile hive 11 on CDH 4.6 ? Thanks, Anurag Tangri