Re: jenkins is going down NOW -- POWER OUTAGE DUE TO FIRE

2017-08-02 Thread shane knapp
alright, things are looking... better. our sysadmin jon got the workers up, and things seem to be happily building. i'll check build results tomorrow morning when i get to the office and make sure that things are behaving as expected. sorry again for the drama... it's been a crazy couple of mo

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Hyukjin Kwon
I think it works for anyone who can leave a web link and comment. For "in progress" Resolution, it looks I am unable to manually set. Please let me know if anyone knows. For single JIRA, I manually modified the script to process single item before. I guess you know what the script does but want

Re: jenkins is going down NOW -- POWER OUTAGE DUE TO FIRE

2017-08-02 Thread shane knapp
we just got the all clear, and power was not cut off. however, the remote consoles on most of the workers isn't working, and i can't currently can't power them back on. right now we've got the two staging/ubuntu workers up, as well as one generic centos node. more updates as they come. On Wed,

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Holden Karau
The memory overhead is based less on the total amount of data and more on what you end up doing with the data (e.g. if your doing a lot of off-heap processing or using Python you need to increase it). Honestly most people find this number for their job "experimentally" (e.g. they try a few differen

jenkins is going down NOW -- POWER OUTAGE DUE TO FIRE

2017-08-02 Thread shane knapp
we have a massive fire in the hills behind campus, and PG&E is shutting down all of the transformers on campus as a precaution. this will impact jenkins. i will be shutting down the workers immediately. http://www.berkeleyside.com/2017/08/02/crews-respond-wildland-fire-east-bay-hills/ https://l

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Chetan Khatri
Ryan, Thank you for reply. For 2 TB of Data what should be the value of spark.yarn.executor.memoryOverhead = ? with regards to this - i see issue at spark https://issues.apache.org/jira/browse/SPARK-18787 , not sure whether it works or not at Spark 2.0.1 ! can you elaborate more for spark.memor

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Josh Rosen
Usually the backend of https://spark-prs.appspot.com does the linking while processing PR update tasks. It appears that the site's connections to JIRA have started failing: ConnectionError: ('Connection aborted.', HTTPException('Deadline exceeded while waiting for HTTP response from URL: https://i

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Bryan Cutler
Thanks Hyukjin! I didn't see your previous message.. It looks like your manual run worked pretty well for the JIRAs I'm following, the only thing is that it didn't mark them as "in progress", but that's not a big deal. Otherwise that helps until we can find out why it's not doing this automatical

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Hyukjin Kwon
I was wondering about this too.. Yes, actually, I have been manually adding some links by resembling the same steps in the script before. I was thinking it'd rather be nicer to run this manually once and then I ran this against single JIRA first - https://issues.apache.org/jira/browse/SPARK-215

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Sean Owen
Hyukjin mentioned this here earlier today and had run it manually, but yeah I'm not sure where it normally runs or why it hasn't. Shane not sure if you're the person to ask? On Wed, Aug 2, 2017 at 7:47 PM Bryan Cutler wrote: > Hi Devs, > > I've noticed a couple PRs recently have not been automat

Some PRs not automatically linked to JIRAs

2017-08-02 Thread Bryan Cutler
Hi Devs, I've noticed a couple PRs recently have not been automatically linked to the related JIRAs. This was one of mine (I linked it manually) https://issues.apache.org/jira/browse/SPARK-21583, but I've seen it happen elsewhere. I think this is the script that does it, but it hasn't been chang

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Ryan Blue
Chetan, When you're writing to a partitioned table, you want to use a shuffle to avoid the situation where each task has to write to every partition. You can do that either by adding a repartition by your table's partition keys, or by adding an order by with the partition keys and then columns you

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Chetan Khatri
Can anyone please guide me with above issue. On Wed, Aug 2, 2017 at 6:28 PM, Chetan Khatri wrote: > Hello Spark Users, > > I have Hbase table reading and writing to Hive managed table where i > applied partitioning by date column which worked fine but it has generate > more number of files in a

Question about manually running dev/github_jira_sync.py

2017-08-02 Thread Hyukjin Kwon
Hi all, I lately realised it looks we have some problem between somewhere executing dev/github_jira_sync.py and JIRA. So, I see committers or issue reporters manually leave PR links in their JIRAs or multiple PRs are open for the same JIRA. Would anyone mind if I manually run this script? I ran