There is no space for new record

2018-02-08 Thread SNEHASISH DUTTA
Hi ,

I am facing the following when running on EMR

Caused by: java.lang.IllegalStateException: There is no space for new record
at
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.java:226)
at
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:132)
at
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:250)

I am using spark 2.2 , what spark configuration should be changed/modified
to get this resolved


Regards,
Snehasish


Re: File JIRAs for all flaky test failures

2018-02-08 Thread Marcelo Vanzin
Hey all,

I just wanted to bring up Kay's old e-mail about this.

If you see a flaky test during a PR, don't just ask for a re-test.
File a bug so that we know that test is flaky and someone will
eventually take a look at it. A lot of them also make great newbie
bugs.

I've filed a bunch of these in the past months, and every time I look
for the test in jira, there was nothing filed yet. And most of those
ended up fixed. Visibility into these things helps getting them fixed.


On Wed, Feb 15, 2017 at 12:10 PM, Kay Ousterhout
 wrote:
> Hi all,
>
> I've noticed the Spark tests getting increasingly flaky -- it seems more
> common than not now that the tests need to be re-run at least once on PRs
> before they pass.  This is both annoying and problematic because it makes it
> harder to tell when a PR is introducing new flakiness.
>
> To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
> fails on a PR (for a reason unrelated to the PR).  Just provide a quick
> description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
> "Tests failed because 250m timeout expired", a link to the failed build, and
> include the "Tests" component.  If there's already a JIRA for the issue,
> just comment with a link to the latest failure.  I know folks don't always
> have time to track down why a test failed, but this it at least helpful to
> someone else who, later on, is trying to diagnose when the issue started to
> find the problematic code / test.
>
> If this seems like too high overhead, feel free to suggest alternative ways
> to make the tests less flaky!
>
> -Kay



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Difficulties building spark-master with sbt

2018-02-08 Thread ds
Thanks for the answer, but that doesn't solve my problem. The cmd doesn't
recognize ./build/sbt ('.\build\sbt' is not recognized as an internal or
external command, operable program or batch file.), even when the full path
to the sbt file is specified. 

I just realized that I haven't mentioned that I'm running this on windows,
does that make a difference?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Koert Kuipers
w
​ire compatibility is relevant if hadoop is included in spark build


for those of us that build spark without hadoop included hadoop (binary)
api compatibility matters. i wouldn't want to build against hadoop 2.7 and
deploy on hadoop 2.6, but i am ok the other way around. so to get the
compatibility with all the major distros and cloud providers building
against hadoop 2.6 is currently the way to go.


On Thu, Feb 8, 2018 at 5:09 PM, Marcelo Vanzin  wrote:

> I think it would make sense to drop one of them, but not necessarily 2.6.
>
> It kinda depends on what wire compatibility guarantees the Hadoop
> libraries have; can a 2.6 client talk to 2.7 (pretty certain it can)?
> Is the opposite safe (not sure)?
>
> If the answer to the latter question is "no", then keeping 2.6 and
> dropping 2.7 makes more sense. Those who really want a
> Hadoop-version-specific package can override the needed versions in
> the command line, or use the "without hadoop" package.
>
> But in the context of trying to support 3.0 it makes sense to drop one
> of them, at least from jenkins.
>
>
> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
> > That would still work with a Hadoop-2.7-based profile, as there isn't
> > actually any code difference in Spark that treats the two versions
> > differently (nor, really, much different between 2.6 and 2.7 to begin
> with).
> > This practice of different profile builds was pretty unnecessary after
> 2.2;
> > it's mostly vestigial now.
> >
> > On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers  wrote:
> >>
> >> CDH 5 is still based on hadoop 2.6
> >>
> >> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
> >>>
> >>> Mostly just shedding the extra build complexity, and builds. The
> primary
> >>> little annoyance is it's 2x the number of flaky build failures to
> examine.
> >>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not
> >>> sure there is anything compelling.
> >>>
> >>> It's something that probably gains us virtually nothing now, but isn't
> >>> too painful either.
> >>> I think it will not make sense to distinguish them once any Hadoop
> >>> 3-related support comes into the picture, and maybe that will start
> soon;
> >>> there were some more pings on related JIRAs this week. You could view
> it as
> >>> early setup for that move.
> >>>
> >>>
> >>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin 
> wrote:
> 
>  Does it gain us anything to drop 2.6?
> 
>  > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
>  >
>  > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both
>  > fairly old, and actually, not different from 2.7 with respect to
> Spark. That
>  > is, I don't know if we are actually maintaining anything here but a
> separate
>  > profile and 2x the number of test builds.
>  >
>  > The cost is, by the same token, low. However I'm floating the idea
> of
>  > removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
> >>
> >>
> >
>
>
>
> --
> Marcelo
>


Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Marcelo Vanzin
I think it would make sense to drop one of them, but not necessarily 2.6.

It kinda depends on what wire compatibility guarantees the Hadoop
libraries have; can a 2.6 client talk to 2.7 (pretty certain it can)?
Is the opposite safe (not sure)?

If the answer to the latter question is "no", then keeping 2.6 and
dropping 2.7 makes more sense. Those who really want a
Hadoop-version-specific package can override the needed versions in
the command line, or use the "without hadoop" package.

But in the context of trying to support 3.0 it makes sense to drop one
of them, at least from jenkins.


On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
> That would still work with a Hadoop-2.7-based profile, as there isn't
> actually any code difference in Spark that treats the two versions
> differently (nor, really, much different between 2.6 and 2.7 to begin with).
> This practice of different profile builds was pretty unnecessary after 2.2;
> it's mostly vestigial now.
>
> On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers  wrote:
>>
>> CDH 5 is still based on hadoop 2.6
>>
>> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
>>>
>>> Mostly just shedding the extra build complexity, and builds. The primary
>>> little annoyance is it's 2x the number of flaky build failures to examine.
>>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not
>>> sure there is anything compelling.
>>>
>>> It's something that probably gains us virtually nothing now, but isn't
>>> too painful either.
>>> I think it will not make sense to distinguish them once any Hadoop
>>> 3-related support comes into the picture, and maybe that will start soon;
>>> there were some more pings on related JIRAs this week. You could view it as
>>> early setup for that move.
>>>
>>>
>>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin  wrote:

 Does it gain us anything to drop 2.6?

 > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
 >
 > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both
 > fairly old, and actually, not different from 2.7 with respect to Spark. 
 > That
 > is, I don't know if we are actually maintaining anything here but a 
 > separate
 > profile and 2x the number of test builds.
 >
 > The cost is, by the same token, low. However I'm floating the idea of
 > removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
>>
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Koert Kuipers
oh nevermind i am used to spark builds without hadoop included. but i
realize that if hadoop is included it matters if its 2.6 or 2.7...

On Thu, Feb 8, 2018 at 5:06 PM, Koert Kuipers  wrote:

> wouldn't hadoop 2.7 profile means someone by introduces usage of some
> hadoop apis that dont exist in hadoop 2.6?
>
> why not keep 2.6 and ditch 2.7 given that hadoop 2.7 is backwards
> compatible with 2.6? what is the added value of having a 2.7 profile?
>
> On Thu, Feb 8, 2018 at 5:03 PM, Sean Owen  wrote:
>
>> That would still work with a Hadoop-2.7-based profile, as there isn't
>> actually any code difference in Spark that treats the two versions
>> differently (nor, really, much different between 2.6 and 2.7 to begin
>> with). This practice of different profile builds was pretty unnecessary
>> after 2.2; it's mostly vestigial now.
>>
>> On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers  wrote:
>>
>>> CDH 5 is still based on hadoop 2.6
>>>
>>> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
>>>
 Mostly just shedding the extra build complexity, and builds. The
 primary little annoyance is it's 2x the number of flaky build failures to
 examine.
 I suppose it allows using a 2.7+-only feature, but outside of YARN, not
 sure there is anything compelling.

 It's something that probably gains us virtually nothing now, but isn't
 too painful either.
 I think it will not make sense to distinguish them once any Hadoop
 3-related support comes into the picture, and maybe that will start soon;
 there were some more pings on related JIRAs this week. You could view it as
 early setup for that move.


 On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin 
 wrote:

> Does it gain us anything to drop 2.6?
>
> > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
> >
> > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both
> fairly old, and actually, not different from 2.7 with respect to Spark.
> That is, I don't know if we are actually maintaining anything here but a
> separate profile and 2x the number of test builds.
> >
> > The cost is, by the same token, low. However I'm floating the idea
> of removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
>

>>>
>


Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Koert Kuipers
wouldn't hadoop 2.7 profile means someone by introduces usage of some
hadoop apis that dont exist in hadoop 2.6?

why not keep 2.6 and ditch 2.7 given that hadoop 2.7 is backwards
compatible with 2.6? what is the added value of having a 2.7 profile?

On Thu, Feb 8, 2018 at 5:03 PM, Sean Owen  wrote:

> That would still work with a Hadoop-2.7-based profile, as there isn't
> actually any code difference in Spark that treats the two versions
> differently (nor, really, much different between 2.6 and 2.7 to begin
> with). This practice of different profile builds was pretty unnecessary
> after 2.2; it's mostly vestigial now.
>
> On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers  wrote:
>
>> CDH 5 is still based on hadoop 2.6
>>
>> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
>>
>>> Mostly just shedding the extra build complexity, and builds. The primary
>>> little annoyance is it's 2x the number of flaky build failures to examine.
>>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not
>>> sure there is anything compelling.
>>>
>>> It's something that probably gains us virtually nothing now, but isn't
>>> too painful either.
>>> I think it will not make sense to distinguish them once any Hadoop
>>> 3-related support comes into the picture, and maybe that will start soon;
>>> there were some more pings on related JIRAs this week. You could view it as
>>> early setup for that move.
>>>
>>>
>>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin  wrote:
>>>
 Does it gain us anything to drop 2.6?

 > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
 >
 > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both
 fairly old, and actually, not different from 2.7 with respect to Spark.
 That is, I don't know if we are actually maintaining anything here but a
 separate profile and 2x the number of test builds.
 >
 > The cost is, by the same token, low. However I'm floating the idea of
 removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?

>>>
>>


Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Sean Owen
That would still work with a Hadoop-2.7-based profile, as there isn't
actually any code difference in Spark that treats the two versions
differently (nor, really, much different between 2.6 and 2.7 to begin
with). This practice of different profile builds was pretty unnecessary
after 2.2; it's mostly vestigial now.

On Thu, Feb 8, 2018 at 3:57 PM Koert Kuipers  wrote:

> CDH 5 is still based on hadoop 2.6
>
> On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:
>
>> Mostly just shedding the extra build complexity, and builds. The primary
>> little annoyance is it's 2x the number of flaky build failures to examine.
>> I suppose it allows using a 2.7+-only feature, but outside of YARN, not
>> sure there is anything compelling.
>>
>> It's something that probably gains us virtually nothing now, but isn't
>> too painful either.
>> I think it will not make sense to distinguish them once any Hadoop
>> 3-related support comes into the picture, and maybe that will start soon;
>> there were some more pings on related JIRAs this week. You could view it as
>> early setup for that move.
>>
>>
>> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin  wrote:
>>
>>> Does it gain us anything to drop 2.6?
>>>
>>> > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
>>> >
>>> > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both
>>> fairly old, and actually, not different from 2.7 with respect to Spark.
>>> That is, I don't know if we are actually maintaining anything here but a
>>> separate profile and 2x the number of test builds.
>>> >
>>> > The cost is, by the same token, low. However I'm floating the idea of
>>> removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
>>>
>>
>


Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Koert Kuipers
CDH 5 is still based on hadoop 2.6

On Thu, Feb 8, 2018 at 2:03 PM, Sean Owen  wrote:

> Mostly just shedding the extra build complexity, and builds. The primary
> little annoyance is it's 2x the number of flaky build failures to examine.
> I suppose it allows using a 2.7+-only feature, but outside of YARN, not
> sure there is anything compelling.
>
> It's something that probably gains us virtually nothing now, but isn't too
> painful either.
> I think it will not make sense to distinguish them once any Hadoop
> 3-related support comes into the picture, and maybe that will start soon;
> there were some more pings on related JIRAs this week. You could view it as
> early setup for that move.
>
>
> On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin  wrote:
>
>> Does it gain us anything to drop 2.6?
>>
>> > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
>> >
>> > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly
>> old, and actually, not different from 2.7 with respect to Spark. That is, I
>> don't know if we are actually maintaining anything here but a separate
>> profile and 2x the number of test builds.
>> >
>> > The cost is, by the same token, low. However I'm floating the idea of
>> removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
>>
>


Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Sean Owen
Mostly just shedding the extra build complexity, and builds. The primary
little annoyance is it's 2x the number of flaky build failures to examine.
I suppose it allows using a 2.7+-only feature, but outside of YARN, not
sure there is anything compelling.

It's something that probably gains us virtually nothing now, but isn't too
painful either.
I think it will not make sense to distinguish them once any Hadoop
3-related support comes into the picture, and maybe that will start soon;
there were some more pings on related JIRAs this week. You could view it as
early setup for that move.


On Thu, Feb 8, 2018 at 12:57 PM Reynold Xin  wrote:

> Does it gain us anything to drop 2.6?
>
> > On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
> >
> > At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly
> old, and actually, not different from 2.7 with respect to Spark. That is, I
> don't know if we are actually maintaining anything here but a separate
> profile and 2x the number of test builds.
> >
> > The cost is, by the same token, low. However I'm floating the idea of
> removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?
>


Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Reynold Xin
Does it gain us anything to drop 2.6?

> On Feb 8, 2018, at 10:50 AM, Sean Owen  wrote:
> 
> At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly old, 
> and actually, not different from 2.7 with respect to Spark. That is, I don't 
> know if we are actually maintaining anything here but a separate profile and 
> 2x the number of test builds.
> 
> The cost is, by the same token, low. However I'm floating the idea of 
> removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Drop the Hadoop 2.6 profile?

2018-02-08 Thread Sean Owen
At this point, with Hadoop 3 on deck, I think hadoop 2.6 is both fairly
old, and actually, not different from 2.7 with respect to Spark. That is, I
don't know if we are actually maintaining anything here but a separate
profile and 2x the number of test builds.

The cost is, by the same token, low. However I'm floating the idea of
removing the 2.6 profile and just requiring 2.7+ as of Spark 2.4?


Re: Difficulties building spark-master with sbt

2018-02-08 Thread Jacek Laskowski
Hi,

s,sbt ./build/sbt,./build/sbt

In other words, don't execute sbt with ./build/sbt, but ./build/sbt itself
(you don't even have to install sbt to build spark as it's included in the
repo and the script uses it internally)

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

On Thu, Feb 8, 2018 at 12:12 AM, ds  wrote:

> After cloning today's version of spark-master, I run the following command:
> S:\spark-master>sbt ./build/sbt -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0
> -Phive -Phive-thriftserver clean package
> with the intention of building both the source and test projects and
> generating the corresponding .jar files.
>
> The script started regularly, but ultimately failed with the following log
> excerpt:
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=256m;
> support was removed in
> 8.0
> [info] Loading project definition from S:\spark-master\project
> [info] Resolving key references (16939 settings) ...
> [info] Set current project to spark-parent (in build
> file:/S:/spark-master/)
> [error] Expected letter
> [error] Expected symbol
> [error] Expected '!'
> [error] Expected '+'
> [error] Expected '++'
> [error] Expected '^'
> [error] Expected '^^'
> [error] Expected 'debug'
> [error] Expected 'info'
> [error] Expected 'warn'
> [error] Expected 'error'
> [error] Expected ';'
> [error] Expected end of input.
> [error] Expected 'early('
> [error] Expected '-'
> [error] Expected '--'
> [error] Expected 'show'
> [error] Expected 'all'
> [error] Expected '*'
> [error] Expected '{'
> [error] Expected project ID
> [error] Expected configuration
> [error] Expected key
> [error] ./build/sbt
> [error] ^
>
>
> I tried to follow the instructions found at
> http://www.sparktutorials.net/building-apache-spark-on-your-local-machine
> to
> the best of my understanding, but I don't know how to interpret the error
> and where to begin the troubleshooting.
>
> I'm using eclipse as my IDE, so both scala and java seem to be setup
> properly. Although, after running out of options, I simply run
> S:\spark-master>sbt compile, which failed with these errors (I don't know
> whether this is relevant):
> [error] (core/compile:managedResources) java.io.IOException: Cannot run
> program "bash": CreateProcess error=2, The system cannot find the file
> specified
> [error] (network-common/compile:compileIncremental) java.io.IOException:
> Cannot run program "S:\Program Files\Java\bin\javac" (in directory
> "S:\spark-master"): CreateProcess error=2, The system cannot find the file
> specified
> [error] (tags/compile:compileIncremental) java.io.IOException: Cannot run
> program "S:\Program Files\Java\bin\javac" (in directory "S:\spark-master"):
> CreateProcess error=2, The system cannot find the file specified
>
> Note that javac is located in S:\Program Files\Java\jdk1.8.0_77\bin.
>
> So, I would appreciate help in building and packaging the src and test
> components of the spark source.
>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>