Re: FW: Stage 152 contains a task of very large size (12747 KB). The maximum recommended task size is 100 KB

2019-04-25 Thread Russell Spitzer
I usually only see that in regards to folks parallelizing very large
objects. From what I know, it's really just the data inside the "Partition"
class of the RDD that is being sent back and forth. So usually something
like spark.parallelize(Seq(reallyBigMap)) or something like that. The
parallelize function jams all that data into the RDD's Partition metadata
so that can easily overwhelm the task size.

On Tue, Apr 23, 2019 at 3:57 PM Long, Andrew 
wrote:

> Hey Friends,
>
>
>
> Is there an easy way of figuring out whats being pull into the task
> context?  I’ve been getting the following message which I suspect means
> I’ve unintentional caught some large objects but figuring out what those
> objects are is stumping me.
>
>
>
> 19/04/23 13:52:13 WARN org.apache.spark.internal.Logging$class
> TaskSetManager: Stage 152 contains a task of very large size (12747 KB).
> The maximum recommended task size is 100 KB
>
>
>
> Cheers Andrew
>


Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-25 Thread Josh Rosen
The code for this runs in http://spark-prs.appspot.com (see
https://github.com/databricks/spark-pr-dashboard/blob/1e799c9e510fa8cdc9a6c084a777436bebeabe10/sparkprs/controllers/tasks.py#L137
)

I checked the AppEngine logs and it looks like we're getting error
responses, possibly due to a credentials issue:

Exception when starting progress on JIRA issue SPARK-27355 (
> /base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/controllers/tasks.py:142
> )
> Traceback (most recent call last): File
> Traceback (most recent call last):
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/controllers/tasks.py",
> line 138
> ,
> in update_pr start_issue_progress("%s-%s" % (app.config['JIRA_PROJECT'],
> issue_number)) File
> start_issue_progress("%s-%s" % (app.config['JIRA_PROJECT'], issue_number))
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/jira_api.py",
> line 27
> ,
> in start_issue_progress jira_client = get_jira_client() File
> jira_client = get_jira_client()
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/jira_api.py",
> line 18
> ,
> in get_jira_client app.config['JIRA_PASSWORD'])) File
> app.config['JIRA_PASSWORD']))
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
> line 472
> ,
> in __init__ si = self.server_info() File
> si = self.server_info()
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
> line 2133
> ,
> in server_info j = self._get_json('serverInfo') File
> j = self._get_json('serverInfo')
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
> line 2549
> ,
> in _get_json r = self._session.get(url, params=params) File
> r = self._session.get(url, params=params)
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
> line 151
> ,
> in get return self.__verb('GET', url, **kwargs) File
> return self.__verb('GET', url, **kwargs)
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
> line 147
> ,
> in __verb raise_on_error(response, verb=verb, **kwargs) File
> raise_on_error(response, verb=verb, **kwargs)
> File 
> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
> line 57
> ,
> in raise_on_error r.status_code, error, r.url, request=request, response=r,
> **kwargs) JIRAError: JiraError HTTP 403 url:
> https://issues.apache.org/jira/rest/api/2/serverInfo text:
> CAPTCHA_CHALLENGE; login-url=https://issues.apache.org/jira/login.jsp 
> r.status_code,
> error, r.url, 

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-25 Thread Hyukjin Kwon
Thank you so much Josh .. !!

2019년 4월 25일 (목) 오후 3:04, Josh Rosen 님이 작성:

> The code for this runs in http://spark-prs.appspot.com (see
> https://github.com/databricks/spark-pr-dashboard/blob/1e799c9e510fa8cdc9a6c084a777436bebeabe10/sparkprs/controllers/tasks.py#L137
> )
>
> I checked the AppEngine logs and it looks like we're getting error
> responses, possibly due to a credentials issue:
>
> Exception when starting progress on JIRA issue SPARK-27355 (
>> /base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/controllers/tasks.py:142
>> )
>> Traceback (most recent call last): File
>> Traceback (most recent call last):
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/controllers/tasks.py",
>> line 138
>> ,
>> in update_pr start_issue_progress("%s-%s" % (app.config['JIRA_PROJECT'],
>> issue_number)) File
>> start_issue_progress("%s-%s" % (app.config['JIRA_PROJECT'], issue_number))
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/jira_api.py",
>> line 27
>> ,
>> in start_issue_progress jira_client = get_jira_client() File
>> jira_client = get_jira_client()
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/sparkprs/jira_api.py",
>> line 18
>> ,
>> in get_jira_client app.config['JIRA_PASSWORD'])) File
>> app.config['JIRA_PASSWORD']))
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
>> line 472
>> ,
>> in __init__ si = self.server_info() File
>> si = self.server_info()
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
>> line 2133
>> ,
>> in server_info j = self._get_json('serverInfo') File
>> j = self._get_json('serverInfo')
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/client.py",
>> line 2549
>> ,
>> in _get_json r = self._session.get(url, params=params) File
>> r = self._session.get(url, params=params)
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
>> line 151
>> ,
>> in get return self.__verb('GET', url, **kwargs) File
>> return self.__verb('GET', url, **kwargs)
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
>> line 147
>> ,
>> in __verb raise_on_error(response, verb=verb, **kwargs) File
>> raise_on_error(response, verb=verb, **kwargs)
>> File 
>> "/base/data/home/apps/s~spark-prs/live.412416057856832734/lib/jira/resilientsession.py",
>> line 57
>> ,
>> in raise_on_error r.status_code, error, r.url, request=request, response=r,
>> **kwargs) JIRAError: JiraError HTTP 403 url:
>>