[jira] [Commented] (SPARK-26943) Weird behaviour with `.cache()`

2019-03-02 Thread Will Uto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782518#comment-16782518
 ] 

Will Uto commented on SPARK-26943:
--

Thanks for the information - I was hoping that if I e.g. installed PySpark 
v2.4.0 in each Python Virtual Environment on each cluster worker/node, then I 
could run against Spark v2.4.0, but it sounds like I would need to upgrade 
Spark through something like Cloudera Manager.

> Weird behaviour with `.cache()`
> ---
>
> Key: SPARK-26943
> URL: https://issues.apache.org/jira/browse/SPARK-26943
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Will Uto
>Priority: Major
>
>  
> {code:java}
> sdf.count(){code}
>  
> works fine. However:
>  
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
>  does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
> in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 
> (TID 438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable 
> number: "(N/A)"
>   at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26943) Weird behaviour with `.cache()`

2019-03-02 Thread Will Uto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782418#comment-16782418
 ] 

Will Uto commented on SPARK-26943:
--

Thanks for explanation [~srowen], makes sense - I think this is why I couldn't 
reproduce it locally (on a smaller dataset).

Out of curiosity, is there a way to run a newer version of Spark on a cluster 
e.g. within Python Virtual Environments, or do I have to upgrade an entire 
cluster?

> Weird behaviour with `.cache()`
> ---
>
> Key: SPARK-26943
> URL: https://issues.apache.org/jira/browse/SPARK-26943
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Will Uto
>Priority: Major
>
>  
> {code:java}
> sdf.count(){code}
>  
> works fine. However:
>  
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
>  does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
> in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 
> (TID 438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable 
> number: "(N/A)"
>   at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26943) Weird behaviour with `.cache()`

2019-02-28 Thread Will Uto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780491#comment-16780491
 ] 

Will Uto commented on SPARK-26943:
--

I think this is to do with using Spark version 2.1.0 - is it possible to use a 
later version of Spark without having to upgrade an entire cluster? (e.g. can I 
install a later version of PySpark in all driver/worker virtual environment, 
and is this used in a cluster manner?)

 

> Weird behaviour with `.cache()`
> ---
>
> Key: SPARK-26943
> URL: https://issues.apache.org/jira/browse/SPARK-26943
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Will Uto
>Priority: Major
>
>  
> {code:java}
> sdf.count(){code}
>  
> works fine. However:
>  
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
>  does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
> in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 
> (TID 438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable 
> number: "(N/A)"
>   at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26943) Weird behaviour with `.cache()`

2019-02-20 Thread Will Uto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Uto updated SPARK-26943:
-
Description: 
 
{code:java}
sdf.count(){code}
 

works fine. However:

 
{code:java}
sdf = sdf.cache()
sdf.count()

{code}
 does not, and produces error
{code:java}
Py4JJavaError: An error occurred while calling o314.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 in 
stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 (TID 
438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable 
number: "(N/A)"
at java.text.NumberFormat.parse(NumberFormat.java:350)
{code}

  was:
 
{code:java}
sdf.count(){code}
 

works fine. However:

 
{code:java}
sdf = sdf.cache()
sdf.count()

{code}
 does not, and produces error
{code:java}
Py4JJavaError: An error occurred while calling o314.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 in 
stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 (TID 
438, uat-datanode-02.mint.ukho.gov.uk, executor 1): java.text.ParseException: 
Unparseable number: "(N/A)"
at java.text.NumberFormat.parse(NumberFormat.java:350)
{code}


> Weird behaviour with `.cache()`
> ---
>
> Key: SPARK-26943
> URL: https://issues.apache.org/jira/browse/SPARK-26943
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Will Uto
>Priority: Major
>
>  
> {code:java}
> sdf.count(){code}
>  
> works fine. However:
>  
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
>  does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
> in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 
> (TID 438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable 
> number: "(N/A)"
>   at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26943) Weird behaviour with `.cache()`

2019-02-20 Thread Will Uto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Uto updated SPARK-26943:
-
Description: 
 
{code:java}
sdf.count(){code}
 

works fine. However:

 
{code:java}
sdf = sdf.cache()
sdf.count()

{code}
 does not, and produces error
{code:java}
Py4JJavaError: An error occurred while calling o314.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 in 
stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 (TID 
438, uat-datanode-02.mint.ukho.gov.uk, executor 1): java.text.ParseException: 
Unparseable number: "(N/A)"
at java.text.NumberFormat.parse(NumberFormat.java:350)
{code}

  was:
 
{code:java}
sdf.count(){code}
 

works fine. However:

 
{code:java}
sdf = sdf.cache()
sdf.count()

{code}
 


> Weird behaviour with `.cache()`
> ---
>
> Key: SPARK-26943
> URL: https://issues.apache.org/jira/browse/SPARK-26943
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Will Uto
>Priority: Major
>
>  
> {code:java}
> sdf.count(){code}
>  
> works fine. However:
>  
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
>  does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
> in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 
> (TID 438, uat-datanode-02.mint.ukho.gov.uk, executor 1): 
> java.text.ParseException: Unparseable number: "(N/A)"
>   at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26943) Weird behaviour with `.cache()`

2019-02-20 Thread Will Uto (JIRA)
Will Uto created SPARK-26943:


 Summary: Weird behaviour with `.cache()`
 Key: SPARK-26943
 URL: https://issues.apache.org/jira/browse/SPARK-26943
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.1.0
Reporter: Will Uto


 
{code:java}
sdf.count(){code}
 

works fine. However:

 
{code:java}
sdf = sdf.cache()
sdf.count()

{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org