[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2017-01-04 Thread Tim Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799881#comment-15799881
 ] 

Tim Chan commented on SPARK-19013:
--

[~zsxwing]

{code}
Error:
java.util.ConcurrentModificationException: Multiple HDFSMetadataLog are using 
s3://lumos-emr-logs/streaming-insights-ebb-and-flow-speed-accuracy/offsets
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(HDFSMetadataLog.scala:119)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1$$anonfun$apply$mcZ$sp$1.apply(HDFSMetadataLog.scala:119)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1$$anonfun$apply$mcZ$sp$1.apply(HDFSMetadataLog.scala:119)
at 
org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:79)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:119)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:115)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:115)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:115)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$1.apply$mcV$sp(StreamExecution.scala:346)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$1.apply(StreamExecution.scala:345)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$1.apply(StreamExecution.scala:345)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$reportTimeTaken(StreamExecution.scala:656)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:345)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply$mcZ$sp(StreamExecution.scala:219)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:213)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:213)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$reportTimeTaken(StreamExecution.scala:656)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:212)
at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:208)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:142)
Caused by: java.io.FileNotFoundException: No such file or directory 
's3://lumos-emr-logs/streaming-insights-ebb-and-flow-speed-accuracy/offsets/.45b98c69-6158-4434-a7b2-c3f73d27294e.tmp'
at 
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:812)
at 
org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2286)
at 
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileLinkStatus(EmrFileSystem.java:521)
at 
org.apache.hadoop.fs.DelegateToFileSystem.getFileLinkStatus(DelegateToFileSystem.java:130)
at 
org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:705)
at 
org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.rename(HDFSMetadataLog.scala:309)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:150)

[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Tim Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781029#comment-15781029
 ] 

Tim Chan commented on SPARK-19013:
--

Perhaps the documentation should be revised to recommend against using s3 as a 
location for {{checkpointLocation}}? I will test with an hdfs location and 
update this ticket with my findings. 

> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> ---
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> {code}
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> {code}
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Tim Chan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Chan updated SPARK-19013:
-
Description: 
I have a structured stream job running on EMR. The job will fail due to this

{code}
Multiple HDFSMetadataLog are using s3://mybucket/myapp 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
{code}

There is only one instance of this stream job running.

  was:
I have a structured stream job running on EMR. The job will fail due to this

```
Multiple HDFSMetadataLog are using s3://mybucket/myapp 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
```

There is only one instance of this stream job running.


> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> ---
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> {code}
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> {code}
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Tim Chan (JIRA)
Tim Chan created SPARK-19013:


 Summary: java.util.ConcurrentModificationException when using s3 
path as checkpointLocation 
 Key: SPARK-19013
 URL: https://issues.apache.org/jira/browse/SPARK-19013
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.0.2
Reporter: Tim Chan


I have a structured stream job running on EMR. The job will fail due to this

```
Multiple HDFSMetadataLog are using s3://mybucket/myapp 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
```

There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17466) Error message is not very clear

2016-09-09 Thread Tim Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477873#comment-15477873
 ] 

Tim Chan commented on SPARK-17466:
--

Thanks [~srowen]!

> Error message is not very clear
> ---
>
> Key: SPARK-17466
> URL: https://issues.apache.org/jira/browse/SPARK-17466
> Project: Spark
>  Issue Type: Improvement
>Reporter: Tim Chan
>Priority: Trivial
>
>  User class threw exception: org.apache.spark.sql.AnalysisException: Window 
> Frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW must match the 
> required frame ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING;
> The same Spark SQL that throws this exception in EMR 5.0.0 works just fine in 
> Databricks using Spark 2.0.0/Scala 2.11. I don't even understand what the 
> error means. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17466) Error message is not very clear

2016-09-09 Thread Tim Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477849#comment-15477849
 ] 

Tim Chan commented on SPARK-17466:
--

I suppose, I don't understand why I'm limited to 1 preceding. 

> Error message is not very clear
> ---
>
> Key: SPARK-17466
> URL: https://issues.apache.org/jira/browse/SPARK-17466
> Project: Spark
>  Issue Type: Improvement
>Reporter: Tim Chan
>Priority: Trivial
>
>  User class threw exception: org.apache.spark.sql.AnalysisException: Window 
> Frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW must match the 
> required frame ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING;
> The same Spark SQL that throws this exception in EMR 5.0.0 works just fine in 
> Databricks using Spark 2.0.0/Scala 2.11. I don't even understand what the 
> error means. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17466) Error message is not very clear

2016-09-09 Thread Tim Chan (JIRA)
Tim Chan created SPARK-17466:


 Summary: Error message is not very clear
 Key: SPARK-17466
 URL: https://issues.apache.org/jira/browse/SPARK-17466
 Project: Spark
  Issue Type: Improvement
Reporter: Tim Chan
Priority: Trivial


 User class threw exception: org.apache.spark.sql.AnalysisException: Window 
Frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW must match the required 
frame ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING;

The same Spark SQL that throws this exception in EMR 5.0.0 works just fine in 
Databricks using Spark 2.0.0/Scala 2.11. I don't even understand what the error 
means. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17423) Support IGNORE NULLS option in Window functions

2016-09-07 Thread Tim Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472394#comment-15472394
 ] 

Tim Chan commented on SPARK-17423:
--

[~hvanhovell]

I was able to rewrite this Redshift fragment: 

{code:sql}
DATEDIFF(day,
 LAG(CASE WHEN SUM(activities.activity_one, activities.activity_two) > 
0 THEN activities.date END)
   IGNORE NULLS OVER (PARTITION BY activities.user_id ORDER BY 
activities.date),
 activities.date
) AS days_since_last_activity
{code}

as this Spark SQL fragment: 

{code:sql}
DATEDIFF(activities.date,
 LAST(CASE WHEN SUM(activities.activity_one, activities.activity_two) > 
0 THEN activities.date END, true) OVER (PARTITION BY activities.user_id ORDER 
BY activities.date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) 
AS days_since_last_activity
{code}

Thanks for pointing me in the right direction. 



> Support IGNORE NULLS option in Window functions
> ---
>
> Key: SPARK-17423
> URL: https://issues.apache.org/jira/browse/SPARK-17423
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Tim Chan
>Priority: Minor
>
> http://stackoverflow.com/questions/24338119/is-it-possible-to-ignore-null-values-when-using-lag-and-lead-functions-in-sq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17423) Support IGNORE NULLS option in Window functions

2016-09-06 Thread Tim Chan (JIRA)
Tim Chan created SPARK-17423:


 Summary: Support IGNORE NULLS option in Window functions
 Key: SPARK-17423
 URL: https://issues.apache.org/jira/browse/SPARK-17423
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Tim Chan
Priority: Minor


http://stackoverflow.com/questions/24338119/is-it-possible-to-ignore-null-values-when-using-lag-and-lead-functions-in-sq





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org