[jira] [Commented] (SPARK-14561) History Server does not see new logs in S3

2021-05-17 Thread Tianbin Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346563#comment-17346563
 ] 

Tianbin Jiang commented on SPARK-14561:
---

"If you are going to use s3 as the dest, and you want to see incomplete apps, 
then you'll need to configure the spark job to have smaller partition size (64? 
128? MB)." can anyone explain more on this? Which configuration maps to 
partition size: https://spark.apache.org/docs/2.4.5/configuration.html

> History Server does not see new logs in S3
> --
>
> Key: SPARK-14561
> URL: https://issues.apache.org/jira/browse/SPARK-14561
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Miles Crawford
>Priority: Major
>  Labels: bulk-closed
>
> If you set the Spark history server to use a log directory with an s3a:// 
> url, everything appears to work fine at first, but new log files written by 
> applications are not picked up by the server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14561) History Server does not see new logs in S3

2016-10-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572401#comment-15572401
 ] 

Steve Loughran commented on SPARK-14561:


To clarify: it's not changes in existing files that aren't showing up, *it is 
new files added to the same destination directory*


If that's the case, something is up with the scanning

#. set the logging of  org.apache.spark.deploy.history.FsHistoryProvider  to 
debug
# have a look at the scan interval. Is it too long? 

> History Server does not see new logs in S3
> --
>
> Key: SPARK-14561
> URL: https://issues.apache.org/jira/browse/SPARK-14561
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Miles Crawford
>
> If you set the Spark history server to use a log directory with an s3a:// 
> url, everything appears to work fine at first, but new log files written by 
> applications are not picked up by the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14561) History Server does not see new logs in S3

2016-04-12 Thread Miles Crawford (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237525#comment-15237525
 ] 

Miles Crawford commented on SPARK-14561:


Steve Loughran on the user list says:
{quote}
s3 isn't a real filesystem, and apps writing to it don't have any data written 
until one of
 -the output stream is close()'d. This happens at the end of the app
 -the file is set up to be partitioned and a partition size is crossed

Until either of those conditions are met, the history server isn't going to see 
anything.

If you are going to use s3 as the dest, and you want to see incomplete apps, 
then you'll need to configure the spark job to have smaller partition size (64? 
128? MB).

If it's completed apps that aren't being seen by the HS, then that's a bug, 
though if its against s3 only, likely to be something related to directory 
listings
{quote}

I agree - and it is only new, completed jobs that aren't showing up. If I 
restart the history server, it catches up and sees all the jobs.

> History Server does not see new logs in S3
> --
>
> Key: SPARK-14561
> URL: https://issues.apache.org/jira/browse/SPARK-14561
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Miles Crawford
>
> If you set the Spark history server to use a log directory with an s3a:// 
> url, everything appears to work fine at first, but new log files written by 
> applications are not picked up by the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org