There is no space for new record
Hi , I am facing the following when running on EMR Caused by: java.lang.IllegalStateException: There is no space for new record at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.java:226) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:132) at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:250) I am using spark 2.2 , what spark configuration should be changed/modified to get this resolved Regards, Snehasish
Re: File JIRAs for all flaky test failures
Hey all, I just wanted to bring up Kay's old e-mail about this. If you see a flaky test during a PR, don't just ask for a re-test. File a bug so that we know that test is flaky and someone will eventually take a look at it. A lot of them also make great newbie bugs. I've filed a bunch of these in the past months, and every time I look for the test in jira, there was nothing filed yet. And most of those ended up fixed. Visibility into these things helps getting them fixed. On Wed, Feb 15, 2017 at 12:10 PM, Kay Ousterhoutwrote: > Hi all, > > I've noticed the Spark tests getting increasingly flaky -- it seems more > common than not now that the tests need to be re-run at least once on PRs > before they pass. This is both annoying and problematic because it makes it > harder to tell when a PR is introducing new flakiness. > > To try to clean this up, I'd propose filing a JIRA *every time* Jenkins > fails on a PR (for a reason unrelated to the PR). Just provide a quick > description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or > "Tests failed because 250m timeout expired", a link to the failed build, and > include the "Tests" component. If there's already a JIRA for the issue, > just comment with a link to the latest failure. I know folks don't always > have time to track down why a test failed, but this it at least helpful to > someone else who, later on, is trying to diagnose when the issue started to > find the problematic code / test. > > If this seems like too high overhead, feel free to suggest alternative ways > to make the tests less flaky! > > -Kay -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Difficulties building spark-master with sbt
Thanks for the answer, but that doesn't solve my problem. The cmd doesn't recognize ./build/sbt ('.\build\sbt' is not recognized as an internal or external command, operable program or batch file.), even when the full path to the sbt file is specified. I just realized that I haven't mentioned that I'm running this on windows, does that make a difference? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Drop the Hadoop 2.6 profile?
w ire compatibility is relevant if hadoop is included in spark build for those of us that build spark without hadoop included hadoop (binary) api compatibility matters. i wouldn't want to build against hadoop 2.7 and deploy on hadoop 2.6, but i am ok the other way around. so to get the compatibility with all the major distros and cloud providers building against hadoop 2.6 is currently the way to go. On Thu, Feb 8, 2018 at 5:09 PM, Marcelo Vanzinwrote: > I think it would make sense to drop one of them, but not necessarily 2.6. > > It kinda depends on what wire compatibility guarantees the Hadoop > libraries have; can a 2.6 client talk to 2.7 (pretty certain it can)? > Is the opposite safe (not sure)? > > If the answer to the latter question is "no", then keeping 2.6 and > dropping 2.7 makes more sense. Those who really want a > Hadoop-version-specific package can override the needed versions in > the command line, or use the "without hadoop" package. > > But in the context of trying to support 3.0 it makes sense to drop one > of them, at least from jenkins. > > > On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen wrote: > > That would still work with a Hadoop-2.7-based profile, as there isn't > > actually any code difference in Spark that treats the two versions > > differently (nor, really, much different between 2.6 and 2.7 to begin > with). > > This practice of different profile builds was pretty unnecessary after > 2.2; > > it's mostly vestigial now. > > > > On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers wrote: > >> > >> CDH 5 is still based on hadoop 2.6 > >> > >> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen wrote: > >>> > >>> Mostly just shedding the extra build complexity, and builds. The > primary > >>> little annoyance is it's 2x the number of flaky build failures to > examine. > >>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not > >>> sure there is anything compelling. > >>> > >>> It's something that probably gains us virtually nothing now, but isn't > >>> too painful either. > >>> I think it will not make sense to distinguish them once any Hadoop > >>> 3-related support comes into the picture, and maybe that will start > soon; > >>> there were some more pings on related JIRAs this week. You could view > it as > >>> early setup for that move. > >>> > >>> > >>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin > wrote: > > Does it gain us anything to drop 2.6? > > > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: > > > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both > > fairly old, and actually, not different from 2.7 with respect to > Spark. That > > is, I don't know if we are actually maintaining anything here but a > separate > > profile and 2x the number of test builds. > > > > The cost is, by the same token, low. However I'm floating the idea > of > > removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? > >> > >> > > > > > > -- > Marcelo >
Re: Drop the Hadoop 2.6 profile?
I think it would make sense to drop one of them, but not necessarily 2.6. It kinda depends on what wire compatibility guarantees the Hadoop libraries have; can a 2.6 client talk to 2.7 (pretty certain it can)? Is the opposite safe (not sure)? If the answer to the latter question is "no", then keeping 2.6 and dropping 2.7 makes more sense. Those who really want a Hadoop-version-specific package can override the needed versions in the command line, or use the "without hadoop" package. But in the context of trying to support 3.0 it makes sense to drop one of them, at least from jenkins. On Thu, Feb 8, 2018 at 2:03 PM, Sean Owenwrote: > That would still work with a Hadoop-2.7-based profile, as there isn't > actually any code difference in Spark that treats the two versions > differently (nor, really, much different between 2.6 and 2.7 to begin with). > This practice of different profile builds was pretty unnecessary after 2.2; > it's mostly vestigial now. > > On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers wrote: >> >> CDH 5 is still based on hadoop 2.6 >> >> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen wrote: >>> >>> Mostly just shedding the extra build complexity, and builds. The primary >>> little annoyance is it's 2x the number of flaky build failures to examine. >>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not >>> sure there is anything compelling. >>> >>> It's something that probably gains us virtually nothing now, but isn't >>> too painful either. >>> I think it will not make sense to distinguish them once any Hadoop >>> 3-related support comes into the picture, and maybe that will start soon; >>> there were some more pings on related JIRAs this week. You could view it as >>> early setup for that move. >>> >>> >>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin wrote: Does it gain us anything to drop 2.6? > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both > fairly old, and actually, not different from 2.7 with respect to Spark. > That > is, I don't know if we are actually maintaining anything here but a > separate > profile and 2x the number of test builds. > > The cost is, by the same token, low. However I'm floating the idea of > removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? >> >> > -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Drop the Hadoop 2.6 profile?
oh nevermind i am used to spark builds without hadoop included. but i realize that if hadoop is included it matters if its 2.6 or 2.7... On Thu, Feb 8, 2018 at 5:06 PM, Koert Kuiperswrote: > wouldn't hadoop 2.7 profile means someone by introduces usage of some > hadoop apis that dont exist in hadoop 2.6? > > why not keep 2.6 and ditch 2.7 given that hadoop 2.7 is backwards > compatible with 2.6? what is the added value of having a 2.7 profile? > > On Thu, Feb 8, 2018 at 5:03 PM, Sean Owen wrote: > >> That would still work with a Hadoop-2.7-based profile, as there isn't >> actually any code difference in Spark that treats the two versions >> differently (nor, really, much different between 2.6 and 2.7 to begin >> with). This practice of different profile builds was pretty unnecessary >> after 2.2; it's mostly vestigial now. >> >> On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers wrote: >> >>> CDH 5 is still based on hadoop 2.6 >>> >>> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen wrote: >>> Mostly just shedding the extra build complexity, and builds. The primary little annoyance is it's 2x the number of flaky build failures to examine. I suppose it allows using a 2.7+-only feature, but outside of YARN, not sure there is anything compelling. It's something that probably gains us virtually nothing now, but isn't too painful either. I think it will not make sense to distinguish them once any Hadoop 3-related support comes into the picture, and maybe that will start soon; there were some more pings on related JIRAs this week. You could view it as early setup for that move. On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin wrote: > Does it gain us anything to drop 2.6? > > > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: > > > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both > fairly old, and actually, not different from 2.7 with respect to Spark. > That is, I don't know if we are actually maintaining anything here but a > separate profile and 2x the number of test builds. > > > > The cost is, by the same token, low. However I'm floating the idea > of removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? > >>> >
Re: Drop the Hadoop 2.6 profile?
wouldn't hadoop 2.7 profile means someone by introduces usage of some hadoop apis that dont exist in hadoop 2.6? why not keep 2.6 and ditch 2.7 given that hadoop 2.7 is backwards compatible with 2.6? what is the added value of having a 2.7 profile? On Thu, Feb 8, 2018 at 5:03 PM, Sean Owenwrote: > That would still work with a Hadoop-2.7-based profile, as there isn't > actually any code difference in Spark that treats the two versions > differently (nor, really, much different between 2.6 and 2.7 to begin > with). This practice of different profile builds was pretty unnecessary > after 2.2; it's mostly vestigial now. > > On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers wrote: > >> CDH 5 is still based on hadoop 2.6 >> >> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen wrote: >> >>> Mostly just shedding the extra build complexity, and builds. The primary >>> little annoyance is it's 2x the number of flaky build failures to examine. >>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not >>> sure there is anything compelling. >>> >>> It's something that probably gains us virtually nothing now, but isn't >>> too painful either. >>> I think it will not make sense to distinguish them once any Hadoop >>> 3-related support comes into the picture, and maybe that will start soon; >>> there were some more pings on related JIRAs this week. You could view it as >>> early setup for that move. >>> >>> >>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin wrote: >>> Does it gain us anything to drop 2.6? > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly old, and actually, not different from 2.7 with respect to Spark. That is, I don't know if we are actually maintaining anything here but a separate profile and 2x the number of test builds. > > The cost is, by the same token, low. However I'm floating the idea of removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? >>> >>
Re: Drop the Hadoop 2.6 profile?
That would still work with a Hadoop-2.7-based profile, as there isn't actually any code difference in Spark that treats the two versions differently (nor, really, much different between 2.6 and 2.7 to begin with). This practice of different profile builds was pretty unnecessary after 2.2; it's mostly vestigial now. On Thu, Feb 8, 2018 at 3:57 PM Koert Kuiperswrote: > CDH 5 is still based on hadoop 2.6 > > On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen wrote: > >> Mostly just shedding the extra build complexity, and builds. The primary >> little annoyance is it's 2x the number of flaky build failures to examine. >> I suppose it allows using a 2.7+-only feature, but outside of YARN, not >> sure there is anything compelling. >> >> It's something that probably gains us virtually nothing now, but isn't >> too painful either. >> I think it will not make sense to distinguish them once any Hadoop >> 3-related support comes into the picture, and maybe that will start soon; >> there were some more pings on related JIRAs this week. You could view it as >> early setup for that move. >> >> >> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin wrote: >> >>> Does it gain us anything to drop 2.6? >>> >>> > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: >>> > >>> > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both >>> fairly old, and actually, not different from 2.7 with respect to Spark. >>> That is, I don't know if we are actually maintaining anything here but a >>> separate profile and 2x the number of test builds. >>> > >>> > The cost is, by the same token, low. However I'm floating the idea of >>> removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? >>> >> >
Re: Drop the Hadoop 2.6 profile?
CDH 5 is still based on hadoop 2.6 On Thu, Feb 8, 2018 at 2:03 PM, Sean Owenwrote: > Mostly just shedding the extra build complexity, and builds. The primary > little annoyance is it's 2x the number of flaky build failures to examine. > I suppose it allows using a 2.7+-only feature, but outside of YARN, not > sure there is anything compelling. > > It's something that probably gains us virtually nothing now, but isn't too > painful either. > I think it will not make sense to distinguish them once any Hadoop > 3-related support comes into the picture, and maybe that will start soon; > there were some more pings on related JIRAs this week. You could view it as > early setup for that move. > > > On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin wrote: > >> Does it gain us anything to drop 2.6? >> >> > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: >> > >> > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly >> old, and actually, not different from 2.7 with respect to Spark. That is, I >> don't know if we are actually maintaining anything here but a separate >> profile and 2x the number of test builds. >> > >> > The cost is, by the same token, low. However I'm floating the idea of >> removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? >> >
Re: Drop the Hadoop 2.6 profile?
Mostly just shedding the extra build complexity, and builds. The primary little annoyance is it's 2x the number of flaky build failures to examine. I suppose it allows using a 2.7+-only feature, but outside of YARN, not sure there is anything compelling. It's something that probably gains us virtually nothing now, but isn't too painful either. I think it will not make sense to distinguish them once any Hadoop 3-related support comes into the picture, and maybe that will start soon; there were some more pings on related JIRAs this week. You could view it as early setup for that move. On Thu, Feb 8, 2018 at 12:57 PM Reynold Xinwrote: > Does it gain us anything to drop 2.6? > > > On Feb 8, 2018, at 10:50 AM, Sean Owen wrote: > > > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly > old, and actually, not different from 2.7 with respect to Spark. That is, I > don't know if we are actually maintaining anything here but a separate > profile and 2x the number of test builds. > > > > The cost is, by the same token, low. However I'm floating the idea of > removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? >
Re: Drop the Hadoop 2.6 profile?
Does it gain us anything to drop 2.6? > On Feb 8, 2018, at 10:50 AM, Sean Owenwrote: > > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly old, > and actually, not different from 2.7 with respect to Spark. That is, I don't > know if we are actually maintaining anything here but a separate profile and > 2x the number of test builds. > > The cost is, by the same token, low. However I'm floating the idea of > removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4? - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Drop the Hadoop 2.6 profile?
At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly old, and actually, not different from 2.7 with respect to Spark. That is, I don't know if we are actually maintaining anything here but a separate profile and 2x the number of test builds. The cost is, by the same token, low. However I'm floating the idea of removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
Re: Difficulties building spark-master with sbt
Hi, s,sbt ./build/sbt,./build/sbt In other words, don't execute sbt with ./build/sbt, but ./build/sbt itself (you don't even have to install sbt to build spark as it's included in the repo and the script uses it internally) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Thu, Feb 8, 2018 at 12:12 AM, dswrote: > After cloning today's version of spark-master, I run the following command: > S:\spark-master>sbt ./build/sbt -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 > -Phive -Phive-thriftserver clean package > with the intention of building both the source and test projects and > generating the corresponding .jar files. > > The script started regularly, but ultimately failed with the following log > excerpt: > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option > MaxPermSize=256m; > support was removed in > 8.0 > [info] Loading project definition from S:\spark-master\project > [info] Resolving key references (16939 settings) ... > [info] Set current project to spark-parent (in build > file:/S:/spark-master/) > [error] Expected letter > [error] Expected symbol > [error] Expected '!' > [error] Expected '+' > [error] Expected '++' > [error] Expected '^' > [error] Expected '^^' > [error] Expected 'debug' > [error] Expected 'info' > [error] Expected 'warn' > [error] Expected 'error' > [error] Expected ';' > [error] Expected end of input. > [error] Expected 'early(' > [error] Expected '-' > [error] Expected '--' > [error] Expected 'show' > [error] Expected 'all' > [error] Expected '*' > [error] Expected '{' > [error] Expected project ID > [error] Expected configuration > [error] Expected key > [error] ./build/sbt > [error] ^ > > > I tried to follow the instructions found at > http://www.sparktutorials.net/building-apache-spark-on-your-local-machine > to > the best of my understanding, but I don't know how to interpret the error > and where to begin the troubleshooting. > > I'm using eclipse as my IDE, so both scala and java seem to be setup > properly. Although, after running out of options, I simply run > S:\spark-master>sbt compile, which failed with these errors (I don't know > whether this is relevant): > [error] (core/compile:managedResources) java.io.IOException: Cannot run > program "bash": CreateProcess error=2, The system cannot find the file > specified > [error] (network-common/compile:compileIncremental) java.io.IOException: > Cannot run program "S:\Program Files\Java\bin\javac" (in directory > "S:\spark-master"): CreateProcess error=2, The system cannot find the file > specified > [error] (tags/compile:compileIncremental) java.io.IOException: Cannot run > program "S:\Program Files\Java\bin\javac" (in directory "S:\spark-master"): > CreateProcess error=2, The system cannot find the file specified > > Note that javac is located in S:\Program Files\Java\jdk1.8.0_77\bin. > > So, I would appreciate help in building and packaging the src and test > components of the spark source. > > > > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >