Spark 3.41 with Java 11 performance on k8s serverless/autopilot

2023-08-07 Thread Mich Talebzadeh
Hi,

I would like to share experience on spark 3.4.1 running on k8s autopilot or
some refer to it as serverless.

My current experience is on Google GKE autopilot
<https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview>.
So essentially you specify the name and region and CSP takes care of the
rest. FYI, I am running Java 11 and spark 3.4.1 on the host submitting
spark-submit. The docker file is also built on java 11, Spark 3.4,1 and
Pyspark

The tag explains it

spark-py:3.4.1-scala_2.12-11-jre-slim-buster-java11PlusPackages

The problem I notice is that cluster starts with e2-medium node which is 4GB

NAME  LOCATION  MASTER_VERSION   MASTER_IP   MACHINE_TYPE
NODE_VERSION NUM_NODES  STATUS
spark-on-gke  europe-west2  1.27.2-gke.1200  34.147.184.xxx  e2-medium
 1.27.2-gke.1200  1  RUNNING

Meaning that the driver starts with that configuration and it takes at
times more than three minutes for the driver to go into RUNNING state. In
contrast, this does not have such problems with spark 3.1.1 and Java 8 both
at the host and the docker file. Any reason why this is happening, taking
into account that Java 11 and Spark 3.4.1 consume more resources.
Essentially is autopilot a good fit for spark?

The spark-submit is shown below

spark-submit --verbose \
   --properties-file ${property_file} \
   --master k8s://https://$KUBERNETES_MASTER_IP:443 \
   --deploy-mode cluster \
   --name $APPNAME \
   --py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \
   --conf spark.kubernetes.namespace=$NAMESPACE \
   --conf spark.network.timeout=300 \
   --conf spark.kubernetes.allocation.batch.size=3 \
   --conf spark.kubernetes.allocation.batch.delay=1 \
   --conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \
   --conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \
   --conf spark.kubernetes.driver.pod.name=$APPNAME \
   --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
   --conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
   --conf
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
   --conf spark.dynamicAllocation.enabled=true \
   --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
   --conf spark.dynamicAllocation.shuffleTracking.timeout=20s \
   --conf spark.dynamicAllocation.executorIdleTimeout=30s \
   --conf spark.dynamicAllocation.cachedExecutorIdleTimeout=40s \
   --conf spark.dynamicAllocation.minExecutors=0 \
   --conf spark.dynamicAllocation.maxExecutors=20 \
   $CODE_DIRECTORY_CLOUD/${APPLICATION}



Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Docker images for Spark 3.1.1 and Spark 3.1.2 with Java 11 and Java 8 from docker hub

2022-02-20 Thread Mich Talebzadeh
Added dockers for Spark 3.2.1 with  default11-jre-slim-buster for spark and
spark-py


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 20 Feb 2022 at 13:50, Mich Talebzadeh 
wrote:

> I have loaded docker files into my docker repository on docker hub and it
> is public.
>
>
> These are built on Spark 3.1.2 OR 3.1.1, with Scala 2.12 and with Java 11
> OR  Java 8 on OS jre-slim-buster. The ones built on 3.1.1 with Java 8
> should work with GCP
>
>
> No additional packages are added to PySpark in docker.
>
>
> They can be downloaded from here
> <https://hub.docker.com/repository/docker/michtalebzadeh/spark_dockerfiles/tags?page=1=last_updated>
>
>
> How to download. The instructions are there.
>
>
> Example:
>
>
> docker pull
> michtalebzadeh/spark_dockerfiles:spark-py-3.1.1-scala_2.12-8-jre-slim-buster
>
>
> Let me know if any issues
>
>
> HTH
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Docker images for Spark 3.1.1 and Spark 3.1.2 with Java 11 and Java 8 from docker hub

2022-02-20 Thread Mich Talebzadeh
I have loaded docker files into my docker repository on docker hub and it
is public.


These are built on Spark 3.1.2 OR 3.1.1, with Scala 2.12 and with Java 11
OR  Java 8 on OS jre-slim-buster. The ones built on 3.1.1 with Java 8
should work with GCP


No additional packages are added to PySpark in docker.


They can be downloaded from here
<https://hub.docker.com/repository/docker/michtalebzadeh/spark_dockerfiles/tags?page=1=last_updated>


How to download. The instructions are there.


Example:


docker pull
michtalebzadeh/spark_dockerfiles:spark-py-3.1.1-scala_2.12-8-jre-slim-buster


Let me know if any issues


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Submitting insert query from beeline failing on executor server with java 11

2021-03-17 Thread kaki mahesh raja
HI Jungtaek Lim ,

Thanks for the response, so we have no option only to wait till hadoop
officially supports java 11.


Thanks and regards,
kaki mahesh raja



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Submitting insert query from beeline failing on executor server with java 11

2021-03-16 Thread Jungtaek Lim
Hmm... I read the page again, and it looks like we are in a gray area.

Hadoop community supports JDK 11 starting from Hadoop 3.3, while we haven't
reached adding Hadoop 3.3 as a dependency. It may not make a real issue on
runtime with Hadoop 3.x as Spark is using a part of Hadoop (client layer),
but worth to know in any way that it's not in official support from the
Hadoop community.

On Wed, Mar 17, 2021 at 6:54 AM Jungtaek Lim 
wrote:

> Hadoop 2.x doesn't support JDK 11. See Hadoop Java version compatibility
> with JDK:
>
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions
>
> That said, you'll need to use Spark 3.x with Hadoop 3.1 profile to make
> Spark work with JDK 11.
>
> On Tue, Mar 16, 2021 at 10:06 PM Sean Owen  wrote:
>
>> That looks like you didn't compile with Java 11 actually. How did you try
>> to do so?
>>
>> On Tue, Mar 16, 2021, 7:50 AM kaki mahesh raja <
>> kaki.mahesh_r...@nokia.com> wrote:
>>
>>> HI All,
>>>
>>> We have compiled spark with java 11 ("11.0.9.1") and when testing the
>>> thrift
>>> server we are seeing that insert query from operator using beeline
>>> failing
>>> with the below error.
>>>
>>> {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z",
>>> "timezone":"UTC", "log":"Uncaught exception in thread
>>> blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044
>>> ,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK],
>>> DatanodeInfoWithStorage[10.75.47.158:1044
>>> ,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"}
>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>> sun.nio.ch.DirectBuffer.cleaner()'
>>> at
>>>
>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40)
>>> ~[hadoop-common-2.10.1.jar:?]
>>> at
>>>
>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780)
>>> ~[hadoop-common-2.10.1.jar:?]
>>> at
>>>
>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322)
>>> ~[hadoop-common-2.10.1.jar:?]
>>> at java.io.FilterInputStream.close(FilterInputStream.java:180)
>>> ~[?:?]
>>> at
>>> org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003)
>>> ~[hadoop-hdfs-client-2.10.1.jar:?]
>>> at
>>> org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845)
>>> ~[hadoop-hdfs-client-2.10.1.jar:?]
>>> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840)
>>> ~[hadoop-hdfs-client-2.10.1.jar:?]
>>> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z",
>>> "timezone":"UTC", "log":"unwrapping token of length:54"}
>>> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z",
>>> "timezone":"UTC", "log":"IPC Client (1437736861) connection to
>>> vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"}
>>>
>>> Any inputs on how to fix this issue would be helpful for us.
>>>
>>> Thanks and Regards,
>>> kaki mahesh raja
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>


Re: Submitting insert query from beeline failing on executor server with java 11

2021-03-16 Thread Jungtaek Lim
Hadoop 2.x doesn't support JDK 11. See Hadoop Java version compatibility
with JDK:

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions

That said, you'll need to use Spark 3.x with Hadoop 3.1 profile to make
Spark work with JDK 11.

On Tue, Mar 16, 2021 at 10:06 PM Sean Owen  wrote:

> That looks like you didn't compile with Java 11 actually. How did you try
> to do so?
>
> On Tue, Mar 16, 2021, 7:50 AM kaki mahesh raja 
> wrote:
>
>> HI All,
>>
>> We have compiled spark with java 11 ("11.0.9.1") and when testing the
>> thrift
>> server we are seeing that insert query from operator using beeline
>> failing
>> with the below error.
>>
>> {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z",
>> "timezone":"UTC", "log":"Uncaught exception in thread
>> blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044
>> ,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK],
>> DatanodeInfoWithStorage[10.75.47.158:1044
>> ,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"}
>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>> sun.nio.ch.DirectBuffer.cleaner()'
>> at
>>
>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40)
>> ~[hadoop-common-2.10.1.jar:?]
>> at
>>
>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780)
>> ~[hadoop-common-2.10.1.jar:?]
>> at
>>
>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322)
>> ~[hadoop-common-2.10.1.jar:?]
>> at java.io.FilterInputStream.close(FilterInputStream.java:180)
>> ~[?:?]
>> at
>> org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003)
>> ~[hadoop-hdfs-client-2.10.1.jar:?]
>> at
>> org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845)
>> ~[hadoop-hdfs-client-2.10.1.jar:?]
>> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840)
>> ~[hadoop-hdfs-client-2.10.1.jar:?]
>> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z",
>> "timezone":"UTC", "log":"unwrapping token of length:54"}
>> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z",
>> "timezone":"UTC", "log":"IPC Client (1437736861) connection to
>> vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"}
>>
>> Any inputs on how to fix this issue would be helpful for us.
>>
>> Thanks and Regards,
>> kaki mahesh raja
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: Submitting insert query from beeline failing on executor server with java 11

2021-03-16 Thread Sean Owen
That looks like you didn't compile with Java 11 actually. How did you try
to do so?

On Tue, Mar 16, 2021, 7:50 AM kaki mahesh raja 
wrote:

> HI All,
>
> We have compiled spark with java 11 ("11.0.9.1") and when testing the
> thrift
> server we are seeing that insert query from operator using beeline failing
> with the below error.
>
> {"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z",
> "timezone":"UTC", "log":"Uncaught exception in thread
> blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044
> ,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK],
> DatanodeInfoWithStorage[10.75.47.158:1044
> ,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"}
> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
> sun.nio.ch.DirectBuffer.cleaner()'
> at
>
> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40)
> ~[hadoop-common-2.10.1.jar:?]
> at
>
> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780)
> ~[hadoop-common-2.10.1.jar:?]
> at
>
> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322)
> ~[hadoop-common-2.10.1.jar:?]
> at java.io.FilterInputStream.close(FilterInputStream.java:180)
> ~[?:?]
> at
> org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003)
> ~[hadoop-hdfs-client-2.10.1.jar:?]
> at
> org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845)
> ~[hadoop-hdfs-client-2.10.1.jar:?]
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840)
> ~[hadoop-hdfs-client-2.10.1.jar:?]
> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z",
> "timezone":"UTC", "log":"unwrapping token of length:54"}
> {"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z",
> "timezone":"UTC", "log":"IPC Client (1437736861) connection to
> vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"}
>
> Any inputs on how to fix this issue would be helpful for us.
>
> Thanks and Regards,
> kaki mahesh raja
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Submitting insert query from beeline failing on executor server with java 11

2021-03-16 Thread kaki mahesh raja
HI All,

We have compiled spark with java 11 ("11.0.9.1") and when testing the thrift
server we are seeing that insert query from operator using beeline failing 
with the below error.

{"type":"log", "level":"ERROR", "time":"2021-03-15T05:06:09.559Z",
"timezone":"UTC", "log":"Uncaught exception in thread
blk_1077144750_3404529@[DatanodeInfoWithStorage[10.75.47.159:1044,DS-1678921c-3fe6-4015-9849-bd1223c23369,DISK],
DatanodeInfoWithStorage[10.75.47.158:1044,DS-0b440eb7-fa7e-4ad8-bb5a-cdc50f3e7660,DISK]]"}
java.lang.NoSuchMethodError: 'sun.misc.Cleaner
sun.nio.ch.DirectBuffer.cleaner()'
at
org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:40)
~[hadoop-common-2.10.1.jar:?]
at
org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:780)
~[hadoop-common-2.10.1.jar:?]
at
org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:322)
~[hadoop-common-2.10.1.jar:?]
at java.io.FilterInputStream.close(FilterInputStream.java:180)
~[?:?]
at
org.apache.hadoop.hdfs.DataStreamer.closeStream(DataStreamer.java:1003)
~[hadoop-hdfs-client-2.10.1.jar:?]
at
org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:845)
~[hadoop-hdfs-client-2.10.1.jar:?]
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840)
~[hadoop-hdfs-client-2.10.1.jar:?]
{"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.570Z",
"timezone":"UTC", "log":"unwrapping token of length:54"}
{"type":"log", "level":"DEBUG", "time":"2021-03-15T05:06:09.599Z",
"timezone":"UTC", "log":"IPC Client (1437736861) connection to
vm-10-75-47-157/10.75.47.157:8020 from cspk got value #4"}

Any inputs on how to fix this issue would be helpful for us.

Thanks and Regards,
kaki mahesh raja



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 3.0.1 giving warning while running with Java 11

2021-01-15 Thread Sachit Murarka
Sure Sean. Thanks for confirmation.

On Fri, 15 Jan 2021, 10:57 Sean Owen,  wrote:

> You can ignore that. Spark 3.x works with Java 11 but it will generate
> some warnings that are safe to disregard.
>
> On Thu, Jan 14, 2021 at 11:26 PM Sachit Murarka 
> wrote:
>
>> Hi All,
>>
>> Getting warning while running spark3.0.1 with Java11 .
>>
>>
>> WARNING: An illegal reflective access operation has occurred
>> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (
>> file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor
>> java.nio.DirectByteBuffer(long,int)
>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.spark.unsafe.Platform
>> WARNING: Use --illegal-access=warn to enable warnings of further illegal
>> reflective access operations
>> WARNING: All illegal access operations will be denied in a future release
>>
>>
>>
>> Kind Regards,
>> Sachit Murarka
>>
>


Re: Spark 3.0.1 giving warning while running with Java 11

2021-01-14 Thread Sean Owen
You can ignore that. Spark 3.x works with Java 11 but it will generate some
warnings that are safe to disregard.

On Thu, Jan 14, 2021 at 11:26 PM Sachit Murarka 
wrote:

> Hi All,
>
> Getting warning while running spark3.0.1 with Java11 .
>
>
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (
> file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor
> java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
>
>
> Kind Regards,
> Sachit Murarka
>


Spark 3.0.1 giving warning while running with Java 11

2021-01-14 Thread Sachit Murarka
Hi All,

Getting warning while running spark3.0.1 with Java11 .


WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (
file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release


Kind Regards,
Sachit Murarka


11

2020-12-16 Thread 张洪斌




发自网易邮箱大师

Re: Spark Compatibility with Java 11

2020-07-14 Thread Ankur Mittal
Thanks a lot.

On Tue, Jul 14, 2020 at 12:51 PM Prashant Sharma 
wrote:

> Hi Ankur,
>
> Java 11 support was added in Spark 3.0.
> https://issues.apache.org/jira/browse/SPARK-24417
>
> Thanks,
>
>
> On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal 
> wrote:
>
>> Hi,
>>
>> I am using Spark 2.X and need to execute Java 11 .Its not able to execute
>> Java 11 using Spark 2.X.
>>
>> Is there any way we can use Java 11 with Spark2.X?
>>
>> Has this issue been resolved  in Spark 3.0 ?
>>
>>
>> --
>> Regards
>> Ankur Mittal
>>
>>

-- 
Regards
Ankur Mittal
*+91-8447899504*


Re: Spark Compatibility with Java 11

2020-07-14 Thread Prashant Sharma
Hi Ankur,

Java 11 support was added in Spark 3.0.
https://issues.apache.org/jira/browse/SPARK-24417

Thanks,


On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal 
wrote:

> Hi,
>
> I am using Spark 2.X and need to execute Java 11 .Its not able to execute
> Java 11 using Spark 2.X.
>
> Is there any way we can use Java 11 with Spark2.X?
>
> Has this issue been resolved  in Spark 3.0 ?
>
>
> --
> Regards
> Ankur Mittal
>
>


Spark Compatibility with Java 11

2020-07-14 Thread Ankur Mittal
Hi,

I am using Spark 2.X and need to execute Java 11 .Its not able to execute
Java 11 using Spark 2.X.

Is there any way we can use Java 11 with Spark2.X?

Has this issue been resolved  in Spark 3.0 ?


-- 
Regards
Ankur Mittal


Re: Java 11 support in Spark 2.5

2020-01-02 Thread Jatin Puri
>From this 
>(http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27966),
> looks like there is no confirmation yet if at all Spark 2.5 would have JDK 11 
>support.

Spark 3 would most likely be out soon (tentatively this quarter as per mailing 
list). Spark 3 is going to have JDK 11 support.

From: Sinha, Breeta (Nokia - IN/Bangalore) 
Sent: Thursday, January 2, 2020 12:48 PM
To: user@spark.apache.org 
Cc: Rao, Abhishek (Nokia - IN/Bangalore) ; Imandi, 
Srinivas (Nokia - IN/Bangalore) 
Subject: Java 11 support in Spark 2.5


Hi All,



Wanted to know if Java 11 support is added in Spark 2.5.

If so, what is the expected timeline for Spark 2.5 release?



Kind Regards,

Breeta Sinha




Java 11 support in Spark 2.5

2020-01-01 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi All,

Wanted to know if Java 11 support is added in Spark 2.5.
If so, what is the expected timeline for Spark 2.5 release?

Kind Regards,
Breeta Sinha



Re: can Spark 2.4 work on JDK 11?

2018-09-29 Thread Felix Cheung
Not officially. We have seen problem with JDK 10 as well. It will be great if 
you or someone would like to contribute to get it to work..



From: kant kodali 
Sent: Tuesday, September 25, 2018 2:31 PM
To: user @spark
Subject: can Spark 2.4 work on JDK 11?

Hi All,

can Spark 2.4 work on JDK 11? I feel like there are lot of features that are 
added in JDK 9, 10, 11 that can make deployment process a whole lot better and 
of course some more syntax sugar similar to Scala.

Thanks!


can Spark 2.4 work on JDK 11?

2018-09-25 Thread kant kodali
Hi All,

can Spark 2.4 work on JDK 11? I feel like there are lot of features that
are added in JDK 9, 10, 11 that can make deployment process a whole lot
better and of course some more syntax sugar similar to Scala.

Thanks!


ApacheCon CFP closing soon (11 February)

2017-01-18 Thread Rich Bowen
Hello, fellow Apache enthusiast. Thanks for your participation, and
interest in, the projects of the Apache Software Foundation.

I wanted to remind you that the Call For Papers (CFP) for ApacheCon
North America, and Apache: Big Data North America, closes in less than a
month. If you've been putting it off because there was lots of time
left, it's time to dig for that inspiration and get those talk proposals in.

It's also time to discuss with your developer and user community whether
there's a track of talks that you might want to propose, so that you
have more complete coverage of your project than a talk or two.

We're looking for talks directly, and indirectly, related to projects at
the Apache Software Foundation. These can be anything from in-depth
technical discussions of the projects you work with, to talks about
community, documentation, legal issues, marketing, and so on. We're also
very interested in talks about projects and services built on top of
Apache projects, and case studies of how you use Apache projects to
solve real-world problems.

We are particularly interested in presentations from Apache projects
either in the Incubator, or recently graduated. ApacheCon is where
people come to find out what technology they'll be using this time next
year.

Important URLs are:

To submit a talk for Apache: Big Data -
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
To submit a talk for ApacheCon -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp

To register for Apache: Big Data -
http://events.linuxfoundation.org/events/apache-big-data-north-america/attend/register-
To register for ApacheCon -
http://events.linuxfoundation.org/events/apachecon-north-america/attend/register-

Early Bird registration rates end March 12th, but if you're a committer
on an Apache project, you get the low committer rate, which is less than
half of the early bird rate!

For further updated about ApacheCon, follow us on Twitter, @ApacheCon,
or drop by our IRC channel, #apachecon on the Freenode IRC network. Or
contact me - rbo...@apache.org - with any questions or concerns.

Thanks!

Rich Bowen, VP Conferences, Apache Software Foundation

-- 
(You've received this email because you're on a dev@ or users@ mailing
list of an Apache Software Foundation project. For subscription and
unsubscription information, consult the headers of this email message,
as this varies from one list to another.)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Schedule lunchtime today for a free webinar IoT data ingestion in Spark Streaming using Kaa 11 a.m. PDT (2 p.m. EDT)

2015-08-04 Thread orozvadovskyy
Hi there! 

If you missed our webinar on IoT data ingestion in Spark with KaaIoT, see the 
video and slides here: http://goo.gl/VMyQ1M 

We recorded our webinar on “IoT data ingestion in Spark Streaming using Kaa” 
for those who couldn’t see it live or who would like to refresh what they have 
learned. During the webinar, we explained and illustrated how Kaa and Spark can 
be effectively used together to address the challenges of IoT data gathering 
and analysis. In this video, you will find highly crystallized, practical 
instruction on setting up your own stream analytics solution with Kaa and 
Spark. 

Best wishes, 
Oleh Rozvadovskyy 
CyberVision Inc 

- Вихідне повідомлення -

Від: Oleh Rozvadovskyy orozvadovs...@cybervisiontech.com 
Кому: user@spark.apache.org 
Надіслано: Четвер, 23 Липень 2015 р 17:48:11 
Тема: Schedule lunchtime today for a free webinar IoT data ingestion in Spark 
Streaming using Kaa 11 a.m. PDT (2 p.m. EDT) 

Hi there! 

Only couple of hours left to our first webinar on IoT data ingestion in Spark 
Streaming using Kaa . 



During the webinar we will build a solution that ingests real-time data from 
Intel Edison into Apache Spark for stream processing. This solution includes a 
client, middleware, and analytics components. All of these software components 
are 100% open-source, therefore, the solution described in this tutorial can be 
used as a prototype for even a commercial product. 

Those, who are interested, please feel free to sign up here . 

Best wishes, 
Oleh Rozvadovskyy 
CyberVision Inc. 

​ 



Schedule lunchtime today for a free webinar IoT data ingestion in Spark Streaming using Kaa 11 a.m. PDT (2 p.m. EDT)

2015-07-23 Thread Oleh Rozvadovskyy
Hi there!

Only couple of hours left to our first webinar on* IoT data ingestion in
Spark Streaming using Kaa*.



During the webinar we will build a solution that ingests real-time data
from Intel Edison into Apache Spark for stream processing. This solution
includes a client, middleware, and analytics components. All of these
software components are 100% open-source, therefore, the solution described
in this tutorial can be used as a prototype for even a commercial product.

Those, who are interested, please feel free to sign up here
https://goo.gl/rgWuj6.

Best wishes,
Oleh Rozvadovskyy
CyberVision Inc.

​


Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
I have recently encountered a similar problem with Guava version collision
with Hadoop.

Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are
they staying in version 11, does anyone know?

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi Sean,

 I removed the hadoop dependencies from the app and ran it on the cluster.
 It gives a java.io.EOFException

 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
 curMem=0, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
 memory (estimated size 173.0 KB, free 1911.2 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
 curMem=177166, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
 bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in
 memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile
 at AvroRelation.scala:45
 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
 SparkPlan.scala:84
 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
 SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
 SparkPlan.scala:84)
 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
 map at SparkPlan.scala:84), which has no missing parents
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
 curMem=202668, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
 memory (estimated size 4.8 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
 curMem=207532, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
 bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in
 memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_1_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at
 DAGScheduler.scala:838
 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage
 0 (MappedRDD[6] at map at SparkPlan.scala:84)
 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
 10.100.5.109): java.io.EOFException
 at
 java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
 at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
 at
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
 at
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
 at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
 at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
 at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
 at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
 at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
 at
 org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
 at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
Actually there is already someone on Hadoop-Common-Dev taking care of
removing the old Guava dependency

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser
https://issues.apache.org/jira/browse/HADOOP-11470

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Mon, Jan 19, 2015 at 4:03 PM, Romi Kuntsman r...@totango.com wrote:

 I have recently encountered a similar problem with Guava version collision
 with Hadoop.

 Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are
 they staying in version 11, does anyone know?

 *Romi Kuntsman*, *Big Data Engineer*
  http://www.totango.com

 On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi Sean,

 I removed the hadoop dependencies from the app and ran it on the cluster.
 It gives a java.io.EOFException

 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
 curMem=0, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
 memory (estimated size 173.0 KB, free 1911.2 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
 curMem=177166, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
 bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in
 memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile
 at AvroRelation.scala:45
 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
 SparkPlan.scala:84
 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
 SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
 SparkPlan.scala:84)
 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
 map at SparkPlan.scala:84), which has no missing parents
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
 curMem=202668, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
 memory (estimated size 4.8 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
 curMem=207532, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
 bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in
 memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_1_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast
 at DAGScheduler.scala:838
 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from
 Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84)
 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0
 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0
 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
 10.100.5.109): java.io.EOFException
 at
 java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
 at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
 at
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
 at
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
 at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
 at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
 at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
 at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
 at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
 at
 org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
 at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at
 java.io.ObjectInputStream.readSerialData

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Ted Yu
Please see this thread:

http://search-hadoop.com/m/LgpTk2aVYgr/Hadoop+guava+upgradesubj=Re+Time+to+address+the+Guava+version+problem


 On Jan 19, 2015, at 6:03 AM, Romi Kuntsman r...@totango.com wrote:
 
 I have recently encountered a similar problem with Guava version collision 
 with Hadoop.
 
 Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they 
 staying in version 11, does anyone know?
 
 Romi Kuntsman, Big Data Engineer
 http://www.totango.com
 
 On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com 
 wrote:
 Hi Sean, 
 
 I removed the hadoop dependencies from the app and ran it on the cluster. It 
 gives a java.io.EOFException 
 
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with 
 curMem=0, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.0 KB, free 1911.2 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with 
 curMem=177166, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 24.9 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block 
 broadcast_0_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at 
 AvroRelation.scala:45
 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at 
 SparkPlan.scala:84
 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at 
 SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at 
 SparkPlan.scala:84)
 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map 
 at SparkPlan.scala:84), which has no missing parents
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with 
 curMem=202668, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in 
 memory (estimated size 4.8 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with 
 curMem=207532, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
 in memory (estimated size 3.4 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
 on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block 
 broadcast_1_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at 
 DAGScheduler.scala:838
 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
 (MappedRDD[6] at map at SparkPlan.scala:84)
 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 
 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 
 10.100.5.109): java.io.EOFException
 at 
 java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
 at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
 at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
 at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
 at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
 at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
 at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
 at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
 at 
 org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
 at 
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Sean Owen
-dev

Guava was not downgraded to 11. That PR was not merged. It was part of a
discussion about, indeed, what to do about potential Guava version
conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

Spark uses 14.0.1 in fact:
https://github.com/apache/spark/blob/master/pom.xml#L330

This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well.

Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
lot of these problems are solved. As we've seen though, this one is tricky.

What's your Spark version? and what are you executing? what mode --
standalone, YARN? What Hadoop version?


On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I came
 across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at
 com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
 at
 com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
 at
 org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
 at
 org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)


 While looking into this I found out that Guava was downgraded to version
 11 in this PR.
 https://github.com/apache/spark/pull/1610

 In this PR OpenHashSet.scala:261 line hashInt has been changed to
 hashLong.
 But when I actually run my app,  java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt error occurs,
 which is understandable because hashInt is not available before Guava 12.

 So, I''m wondering why this occurs?

 Cheers
 --
 Niranda Perera




Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi Sean,

My mistake, Guava 11 dependency came from the hadoop-commons indeed.

I'm running the following simple app in spark 1.2.0 standalone local
cluster (2 workers) with Hadoop 1.2.1

public class AvroSparkTest {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf()
.setMaster(spark://niranda-ThinkPad-T540p:7077)
//(local[2])
.setAppName(avro-spark-test);

JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro);
episodes.printSchema();
episodes.registerTempTable(avroTable);
ListRow result = sqlContext.sql(SELECT * FROM
avroTable).collect();

for (Row row : result) {
System.out.println(row.toString());
}
}
}

As you pointed out, this error occurs while adding the hadoop dependency.
this runs without a problem when the hadoop dependency is removed and the
master is set to local[].

Cheers

On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote:

 -dev

 Guava was not downgraded to 11. That PR was not merged. It was part of a
 discussion about, indeed, what to do about potential Guava version
 conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

 Spark uses 14.0.1 in fact:
 https://github.com/apache/spark/blob/master/pom.xml#L330

 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
 well.

 Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
 lot of these problems are solved. As we've seen though, this one is tricky.

 What's your Spark version? and what are you executing? what mode --
 standalone, YARN? What Hadoop version?


 On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I
 came across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at
 com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
 at
 com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
 at
 org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Sean Owen
Oh, are you actually bundling Hadoop in your app? that may be the problem.
If you're using stand-alone mode, why include Hadoop? In any event, Spark
and Hadoop are intended to be 'provided' dependencies in the app you send
to spark-submit.

On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi Sean,

 My mistake, Guava 11 dependency came from the hadoop-commons indeed.

 I'm running the following simple app in spark 1.2.0 standalone local
 cluster (2 workers) with Hadoop 1.2.1

 public class AvroSparkTest {
 public static void main(String[] args) throws Exception {
 SparkConf sparkConf = new SparkConf()
 .setMaster(spark://niranda-ThinkPad-T540p:7077)
 //(local[2])
 .setAppName(avro-spark-test);

 JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
 JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
 JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

 /home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro);
 episodes.printSchema();
 episodes.registerTempTable(avroTable);
 ListRow result = sqlContext.sql(SELECT * FROM
 avroTable).collect();

 for (Row row : result) {
 System.out.println(row.toString());
 }
 }
 }

 As you pointed out, this error occurs while adding the hadoop dependency.
 this runs without a problem when the hadoop dependency is removed and the
 master is set to local[].

 Cheers

 On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote:

 -dev

 Guava was not downgraded to 11. That PR was not merged. It was part of a
 discussion about, indeed, what to do about potential Guava version
 conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

 Spark uses 14.0.1 in fact:
 https://github.com/apache/spark/blob/master/pom.xml#L330

 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
 well.

 Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
 lot of these problems are solved. As we've seen though, this one is tricky.

 What's your Spark version? and what are you executing? what mode --
 standalone, YARN? What Hadoop version?


 On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I
 came across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at
 org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at
 com.databricks.spark.avro.AvroRelation.buildScan$lzycompute

Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi,

I have been running a simple Spark app on a local spark cluster and I came
across this error.

Exception in thread main java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
at org.apache.spark.util.collection.OpenHashSet.org
$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
at
org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at
org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
at
org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
at
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
at
org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
at
org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
at
org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
at
org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
at
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
at
org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
at
org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
at
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
at
org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
at
org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
at
com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
at
com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
at
org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at
org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)


While looking into this I found out that Guava was downgraded to version 11
in this PR.
https://github.com/apache/spark/pull/1610

In this PR OpenHashSet.scala:261 line hashInt has been changed to hashLong.
But when I actually run my app,  java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt error occurs,
which is understandable because hashInt is not available before Guava 12.

So, I''m wondering why this occurs?

Cheers
-- 
Niranda Perera


Hive 11 / CDH 4.6/ Spark 0.9.1 dilemmna

2014-08-06 Thread Anurag Tangri
I posted this in cdh-user mailing list yesterday and think this should have
been the right audience for this:

=

Hi All,
Not sure if anyone else faced this same issue or not.

We installed CDH 4.6 that uses Hive 0.10.

And we have Spark 0.9.1 that comes with Hive 11.

Now our hive jobs that work on CDH, fail in Shark.

Anyone else facing same issues and any work-arounds ?

Can we re-compile shark 0.9.1 with hive 10 or compile hive 11 on CDH 4.6 ?



Thanks,
Anurag Tangri


Re: Hive 11 / CDH 4.6/ Spark 0.9.1 dilemmna

2014-08-06 Thread Sean Owen
I haven't tried any of this, mind you, but my guess is that your options
are, from least painful and most likely to work onwards, are:

- Get Spark / Shark to compile against Hive 0.10
- Shade Hive 0.11 into Spark
- Update to CDH5.0+

I don't think there will be more updated releases of Shark or
Spark-on-CDH4, so you may want to be moving forward anyway.


On Thu, Aug 7, 2014 at 12:46 AM, Anurag Tangri atan...@groupon.com wrote:

 I posted this in cdh-user mailing list yesterday and think this should
 have been the right audience for this:

 =

 Hi All,
 Not sure if anyone else faced this same issue or not.

 We installed CDH 4.6 that uses Hive 0.10.

 And we have Spark 0.9.1 that comes with Hive 11.

 Now our hive jobs that work on CDH, fail in Shark.

 Anyone else facing same issues and any work-arounds ?

 Can we re-compile shark 0.9.1 with hive 10 or compile hive 11 on CDH 4.6 ?



 Thanks,
 Anurag Tangri