dependencySet, but provided will mark the entire dependency tree as
excluded. It is also possible to exclude jar by jar, but this is
pretty error prone and messy.
On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers ko...@tresata.com
wrote:
yes in sbt assembly you can exclude jars (although i never had a
need
associated with manual
translation of dependency specs from one system to another, while
still maintaining the things which are hard to translate (plugins).
On Wed, Feb 26, 2014 at 7:17 AM, Koert Kuipers ko...@tresata.com
wrote:
We maintain in house spark build using sbt. We have
github is not aware of the new repo being a base-fork, so its not easy to
re-point pull requests. i am guessing it didnt get cloned from the
incubator spark one?
On Wed, Feb 26, 2014 at 5:56 PM, Patrick Wendell pwend...@gmail.com wrote:
Sorry if this wasn't clear - If you are in the middle of
we have a maven corporate repository inhouse and of course we also use
maven central. sbt can handle retrieving from and publishing to maven
repositories just fine. we have maven, ant/ivy and sbt projects depending
on each others artifacts. not sure i see the issue there.
On Tue, Mar 11, 2014 at
Asm is such a mess. And their suggested solution being everyone should
shade it sounds pretty awful to me (not uncommon to have shaded asm 15
times in a single project). But I guess it you are right that shading is
only way to deal with it at this point...
On Mar 11, 2014 5:35 PM, Kevin Markey
/
- GitHub: https://github.com/andypetrella
- Masterbranch: https://masterbranch.com/andy.petrella
On Sat, Mar 15, 2014 at 7:06 PM, Koert Kuipers ko...@tresata.com wrote:
just going head first without any thinking, it changed flatMap to
flatMapData and added a flatMap. for FlatMappedRDD my
i believe kryo serialization uses runtime class, not declared class
we have no issues serializing covariant scala lists
On Sat, Mar 22, 2014 at 11:59 AM, Pascal Voitot Dev
pascal.voitot@gmail.com wrote:
On Sat, Mar 22, 2014 at 3:45 PM, Michael Armbrust mich...@databricks.com
wrote:
classes compiled with java7 run fine on java6 if you specified -target
1.6. however if thats the case generally you should also be able to also
then compile it with java 6 just fine.
something compiled with java7 with -target 1.7 will not run on java 6
On Sat, Apr 5, 2014 at 9:10 PM, Debasish
patrick,
this has happened before, that a commit introduced java 7 code/dependencies
and your build didnt fail, i think it was when reynold upgraded to jetty 9.
must be that your entire build infrastructure runs java 7...
On Sat, Apr 5, 2014 at 6:06 PM, Patrick Wendell pwend...@gmail.com wrote:
:
java.lang.ClassNotFoundException: scala.None$
fun stuff!
On Sun, Apr 6, 2014 at 12:13 PM, Koert Kuipers ko...@tresata.com wrote:
patrick,
this has happened before, that a commit introduced java 7
code/dependencies and your build didnt fail, i think it was when reynold
upgraded to jetty 9. must be that your
i suggest we stick to 2.10.3, since otherwise it seems that (surprisingly)
you force everyone to upgrade
On Sun, Apr 6, 2014 at 1:46 PM, Koert Kuipers ko...@tresata.com wrote:
also, i thought scala 2.10 was binary compatible, but does not seem to be
the case. the spark artifacts for scala
it all depends on what kind of traversing. if its point traversing then a
random access based something would be great.
if its more scan-like traversl then spark will fit
On Tue, Apr 8, 2014 at 4:56 PM, Evan Chan e...@ooyala.com wrote:
I doubt Titan would be able to give you traversal of
i believe matei has said before that he would like to crossbuild for 2.10
and 2.11, given that the difference is not as big as between 2.9 and 2.10.
but dont know when this would happen...
On Sat, May 10, 2014 at 11:02 PM, Gary Malouf malouf.g...@gmail.com wrote:
Considering the team just
db tsai, i do not think userClassPathFirst is working, unless the classes
you load dont reference any classes already loaded by the parent
classloader (a mostly hypothetical situation)... i filed a jira for this
here:
https://issues.apache.org/jira/browse/SPARK-1863
On Tue, May 20, 2014 at 1:04
i suspect there are more cdh4 than cdh5 clusters. most people plan to move
to cdh5 within say 6 months.
On Fri, Aug 29, 2014 at 3:57 AM, Andrew Ash and...@andrewash.com wrote:
FWIW we use CDH4 extensively and would very much appreciate having a
prebuilt version of Spark for it.
We're doing
custom spark builds should not be the answer. at least not if spark ever
wants to have a vibrant community for spark apps.
spark does support a user-classpath-first option, which would deal with
some of these issues, but I don't think it works.
On Sep 4, 2014 9:01 AM, Felix Garcia Borrego
my experience is that there are still a lot of java 6 clusters out there.
also distros that bundle spark still support java 6
On Oct 17, 2014 8:01 PM, Andrew Ash and...@andrewash.com wrote:
Hi Spark devs,
I've heard a few times that keeping support for Java 6 is a priority for
Apache Spark.
100 max width seems very restrictive to me.
even the most restrictive environment i have for development (ssh with
emacs) i get a lot more characters to work with than that.
personally i find the code harder to read, not easier. like i kept
wondering why there are weird newlines in the
middle of
cases where the current limit is
useful (e.g. if you have many windows open in a large screen).
- Patrick
On Thu, Oct 23, 2014 at 11:03 AM, Koert Kuipers ko...@tresata.com
wrote:
100 max width seems very restrictive to me.
even the most restrictive environment i have for development (ssh
${scalastyle.failonviolation}/failOnViolation
includeTestSourceDirectoryfalse/includeTestSourceDirectory
failOnWarningfalse/failOnWarning
sourceDirectory${basedir}/src/main/scala/sourceDirectory
On Thu, Oct 23, 2014 at 12:07 PM, Koert Kuipers ko...@tresata.com wrote:
Hey Ted
SKIPPED
in this case i dont care about Hive, but i would have liked to see REPL
run, and Kafka.
On Thu, Oct 23, 2014 at 4:44 PM, Ted Yu yuzhih...@gmail.com wrote:
Created SPARK-4066 and attached patch there.
On Thu, Oct 23, 2014 at 1:07 PM, Koert Kuipers ko...@tresata.com
oh i found some stuff about tests and how to continue them, gonna try that
now (-fae switch). should have googled before asking...
On Fri, Oct 24, 2014 at 3:59 PM, Koert Kuipers ko...@tresata.com wrote:
thanks ted.
apologies for complaining about maven here again, but this is the first
time
separated) list you provide to
-pl. Also before using -pl you should do a mvn compile package install
on all modules. Use the -pl after those steps are done - and then it is
very effective.
2014-10-24 13:08 GMT-07:00 Sean Owen so...@cloudera.com:
On Fri, Oct 24, 2014 at 8:59 PM, Koert
editor of your choice + sbt console works + grep great.
if only folks stopped using wildcard imports (it has little benefits in
terms of coding yet requires an IDE with 1G+ of ram to track em down).
On Mon, Oct 27, 2014 at 9:17 AM, andy petrella andy.petre...@gmail.com
wrote:
I second the
hello all,
we at tresata wrote a library to provide for batch integration between
spark and kafka (distributed write of rdd to kafa, distributed read of rdd
from kafka). our main use cases are (in lambda architecture jargon):
* period appends to the immutable master dataset on hdfs from kafka
yup, we at tresata do the idempotent store the same way. very simple
approach.
On Fri, Dec 19, 2014 at 5:32 PM, Cody Koeninger c...@koeninger.org wrote:
That KafkaRDD code is dead simple.
Given a user specified map
(topic1, partition0) - (startingOffset, endingOffset)
(topic1, partition1)
yes it does. although the core of spark is written in scala it also
maintains java and python apis, and there is plenty of work for those to
contribute to.
On Sat, Dec 20, 2014 at 7:30 AM, sreenivas putta putta.sreeni...@gmail.com
wrote:
Hi,
I want to contribute for spark in java. Does it
interfaces are both outside catalyst package and in org.apache.spark.sql.
On Tue, Jan 27, 2015 at 9:08 AM, Koert Kuipers ko...@tresata.com wrote:
hey matei,
i think that stuff such as SchemaRDD, columar storage and perhaps also
query planning can be re-used by many systems that do analysis
The context is that SchemaRDD is becoming a common data format used for
bringing data into Spark from external systems, and used for various
components of Spark, e.g. MLlib's new pipeline API.
i agree. this to me also implies it belongs in spark core, not sql
On Mon, Jan 26, 2015 at 6:11 PM,
useless.
On Tue, Feb 10, 2015 at 11:47 AM, Koert Kuipers ko...@tresata.com wrote:
so i understand the success or spark.sql. besides the fact that anything
with the words SQL in its name will have thousands of developers running
towards it because of the familiarity, there is also a genuine
in an efficient
columnar format. And you can also easily persist it on disk using
Parquet,
which is also columnar.
Cheng
On 1/29/15 1:24 PM, Koert Kuipers wrote:
to me the word DataFrame does come with certain expectations. one of
them
is that the data is stored columnar. in R
thread, Koert)
On Mon, Mar 23, 2015 at 3:52 PM, Koert Kuipers ko...@tresata.com
wrote:
see email below. reynold suggested i send it to dev instead of user
-- Forwarded message --
From: Koert Kuipers ko...@tresata.com
Date: Mon, Mar 23, 2015 at 4:36 PM
://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L894
It seems fine to have the same option for the loading functions, if
it's easy to just pass this config into the input format.
On Tue, Mar 24, 2015 at 3:46 PM, Koert Kuipers ko...@tresata.com wrote
see email below. reynold suggested i send it to dev instead of user
-- Forwarded message --
From: Koert Kuipers ko...@tresata.com
Date: Mon, Mar 23, 2015 at 4:36 PM
Subject: hadoop input/output format advanced control
To: u...@spark.apache.org u...@spark.apache.org
currently its
this
is
possible to build over the core API, it's pretty natural
to
organize it
that way, same as Spark Streaming is a library.
Matei
On Jan 26, 2015, at 4:26 PM, Koert Kuipers
ko...@tresata.com
wrote:
The context is that SchemaRDD is becoming a common
i am not sure eol means much if it is still actively used. we have a lot of
clients with centos 5 (for which we still support python 2.4 in some form
or another, fun!). most of them are on centos 6, which means python 2.6. by
cutting out python 2.6 you would cut out the majority of the actual
, Reynold Xin r...@databricks.com wrote:
Guys thanks for chiming in, but please focus on Java here. Python is an
entirely separate issue.
On Thu, Apr 30, 2015 at 12:53 PM, Koert Kuipers ko...@tresata.com wrote:
i am not sure eol means much if it is still actively used. we have a lot
of clients
it seems spark is happy to upgrade scala, drop older java versions, upgrade
incompatible library versions (akka), and all of this within spark 1.x
does the 1.x mean anything in terms of compatibility of dependencies? or is
that limited to its own api? what are the rules?
On May 1, 2015 9:04 AM,
i think i might be misunderstanding, but shouldnt java 6 currently be used
in jenkins?
On Sat, May 2, 2015 at 11:53 PM, shane knapp skn...@berkeley.edu wrote:
that's kinda what we're doing right now, java 7 is the default/standard on
our jenkins.
or, i vote we buy a butler's outfit for
we also launch jobs programmatically, both on standalone mode and
yarn-client mode. in standalone mode it always worked, in yarn-client mode
we ran into some issues and were forced to use spark-submit, but i still
have on my todo list to move back to a normal java launch without
spark-submit at
this looks like a mistake in FrequentItems to me. if the map is full
(map.size==size) then it should still add the new item (after removing
items from the map and decrementing counts).
if its not a mistake then at least it looks to me like the algo is
different than described in the paper. is
i would drop scala 2.10, but definitely keep java 7
cross build for scala 2.12 is great, but i dont know how that works with
java 8 requirement. dont want to make java 8 mandatory.
and probably stating the obvious, but a lot of apis got polluted due to
binary compatibility requirement. cleaning
good point about dropping <2.2 for hadoop. you dont want to deal with
protobuf 2.4 for example
On Wed, Nov 11, 2015 at 4:58 AM, Sean Owen wrote:
> On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin wrote:
> > to the Spark community. A major release should
romi,
unless am i misunderstanding your suggestion you might be interested in
projects like the new mahout where they try to abstract out the engine with
bindings, so that they can support multiple engines within a single
platform. I guess cascading is heading in a similar direction (although no
People who do upstream builds of spark (think bigtop and hadoop distros)
are used to legacy systems like maven, so maven is the default build. I
don't think it will change.
Any improvements for the sbt build are of course welcome (it is still used
by many developers), but i would not do anything
if there is no strong preference for one dependencies policy over another,
but consistency between the 2 systems is desired, then i believe maven can
be made to behave like ivy pretty easily with a setting in the pom
On Fri, Nov 6, 2015 at 5:21 AM, Steve Loughran
wrote:
if DataFrame aspires to be more than a vehicle for SQL then i think it
would be mistake to allow multiple column names. it is very confusing.
pandas indeed allows this and it has led to many bugs. R does not allow it
for data.frame (it renames the name dupes).
i would consider a csv with
i ran into the same thing in scala api. we depend heavily on comma
separated paths, and it no longer works.
On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl wrote:
> Hello everyone.
>
> It seems pyspark dataframe read is broken for reading multiple files.
>
> sql.read.json(
;
>> Could someone please file a JIRA to track this?
>> https://issues.apache.org/jira/browse/SPARK
>>
>> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i ran into the same thing in scala api. we depend heavily on comma
>&g
spark 1.x has been supporting scala 2.11 for 3 or 4 releases now. seems to
me you already provide a clear upgrade path: get on scala 2.11 before
upgrading to spark 2.x
from scala team when scala 2.10.6 came out:
We strongly encourage you to upgrade to the latest stable version of Scala
2.11.x, as
rhel/centos 6 ships with python 2.6, doesnt it?
if so, i still know plenty of large companies where python 2.6 is the only
option. asking them for python 2.7 is not going to work
so i think its a bad idea
On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland
wrote:
> I
access). Does this address the Python versioning concerns for RHEL users?
>
> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> yeah, the practical concern is that we have no control over java or
>> python version on large company clusters. our curr
>>
>> I've been in a couple of projects using Spark (banking industry) where
>> CentOS + Python 2.6 is the toolbox available.
>>
>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>> old and busted, which is totally opposite to the Spark ph
e, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> I think all the slaves need the same (or a compatible) version of Python
>>> installed since they run Python code in PySpark jobs natively.
>>>
>>> On Tue, Jan
d
> version without making your changes open source. The GPL-compatible
> licenses make it possible to combine Python with other software that is
> released under the GPL; the others don’t.
>
> Nick
>
>
> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.
if python 2.7 only has to be present on the node that launches the app
(does it?) than that could be important indeed.
On Tue, Jan 5, 2016 at 6:02 PM, Koert Kuipers <ko...@tresata.com> wrote:
> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas &l
I also thought the idea was to drop 2.10. Do we want to cross build for 3
scala versions?
On Nov 25, 2015 3:54 AM, "Sandy Ryza" wrote:
> I see. My concern is / was that cluster operators will be reluctant to
> upgrade to 2.0, meaning that developers using those clusters
if i wanted to pimp DataFrame to add subtract and intersect myself with a
physical operator, without needing to modify spark directly, is that
currently possible/intended? or will i run into the private[spark] issue?
On Fri, Nov 27, 2015 at 7:36 PM, Reynold Xin wrote:
> We
y
> Analyzer it looks very much like a UTF8String is very corrupt.
>
> Cheers,
>
>
> On Fri, 27 May 2016 at 21:00 Koert Kuipers <ko...@tresata.com> wrote:
>
>> hello all,
>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>&
t;>
>> Cheng
>>
>>
>> On 5/25/16 12:30 PM, Reynold Xin wrote:
>>
>> Based on this discussion I'm thinking we should deprecate the two explode
>> functions.
>>
>> On Wednesday, May 25, 2016, Koert Kuipers < <ko...@tresata.com>
hello all,
after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
trying to run some algorithms at scale on our cluster.
unfortunately this means that when i see errors i am having a harder time
boiling it down to a small reproducible example.
today we are running an iterative
in spark 1.6.1 we used:
sqlContext.read
.format("com.databricks.spark.csv")
.delimiter("~")
.option("quote", null)
this effectively turned off quoting, which is a necessity for certain data
formats where quoting is not supported and "\"" is a valid character itself
in the data.
n API), but that's probably OK
> given they shouldn't change all the time.
>
> Ticket https://issues.apache.org/jira/browse/SPARK-15585
>
>
>
>
> On Thu, May 26, 2016 at 3:35 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> in spark 1.6.1 we us
cannot just send it over.
i will try to create a small test program to reproduce it.
On Fri, May 27, 2016 at 4:25 PM, Reynold Xin <r...@databricks.com> wrote:
> They should get printed if you turn on debug level logging.
>
> On Fri, May 27, 2016 at 1:00 PM, Koert Kuipers <ko...@
hey,
since SPARK-15982 was fixed (https://github.com/apache/spark/pull/13727) i
believe all external DataSources that rely on using .load(path) without
being a FileFormat themselves are broken.
i noticed this because our unit tests for the elasticsearch datasource
broke.
i commented on the
y or so instead informally in
> conversation. Does anyone have a particularly strong opinion on that?
> That's basically an extra 3 month period.
>
> https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
>
> On Tue, Jan 26, 2016 at 10:00 PM, Koert Kuipers <ko...@tresata.com>
dataframe df1:
schema:
StructType(StructField(x,IntegerType,true))
explain:
== Physical Plan ==
MapPartitions , obj#135: object, [if (input[0, object].isNullAt)
null else input[0, object].get AS x#128]
+- MapPartitions , createexternalrow(if (isnull(x#9)) null else
x#9), [input[0, object] AS
https://issues.apache.org/jira/browse/SPARK-13531
On Sat, Feb 27, 2016 at 3:49 AM, Reynold Xin <r...@databricks.com> wrote:
> Can you file a JIRA ticket?
>
>
> On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>
>> dataframe df1:
>&
since a type alias is purely a convenience thing for the scala compiler,
does option 1 mean that the concept of DataFrame ceases to exist from a
java perspective, and they will have to refer to Dataset?
On Thu, Feb 25, 2016 at 6:23 PM, Reynold Xin wrote:
> When we first
i noticed some things stopped working on datasets in spark 2.0.0-SNAPSHOT,
and with a confusing error message (cannot resolved some column with input
columns []).
for example in 1.6.0-SNAPSHOT:
scala> val ds = sc.parallelize(1 to 10).toDS
ds: org.apache.spark.sql.Dataset[Int] = [value: int]
com> wrote:
> Looks like a bug. I'm also not sure whether we support Option yet. (If
> not, we should definitely support that in 2.0.)
>
> Can you file a JIRA ticket?
>
>
> On Mon, Feb 15, 2016 at 7:12 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i notic
outside of Spark isn't supposed to use
> it. Mixing Spark library versions is also not recommended, not just
> because of this reason.
>
> There have been other binary changes in the Logging class in the past too.
>
> On Tue, Mar 15, 2016 at 7:49 AM, Koert Kuipers <ko...@tresata.
oh i just noticed the big warning in spark 1.x Logging
* NOTE: DO NOT USE this class outside of Spark. It is intended as an
internal utility.
* This will likely be changed or removed in future releases.
On Tue, Mar 15, 2016 at 3:29 PM, Koert Kuipers <ko...@tresata.com> wrote:
&
in this commit
8301fadd8d269da11e72870b7a889596e3337839
Author: Marcelo Vanzin
Date: Mon Mar 14 14:27:33 2016 -0700
[SPARK-13626][CORE] Avoid duplicate config deprecation warnings.
the following change was made
-class SparkConf(loadDefaults: Boolean) extends Cloneable
i have been using spark 2.0 snapshots with some libraries build for spark
1.0 so far (simply because it worked). in last few days i noticed this new
error:
[error] Uncaught exception when running
com.tresata.spark.sql.fieldsapi.FieldsApiSpec: java.lang.AbstractMethodError
sbt.ForkMain$ForkError:
i am trying to understand some parts of the catalyst optimizer. but i
struggle with one bigger picture issue:
LogicalPlan extends TreeNode, which makes sense since the optimizations
rely on tree transformations like transformUp and transformDown.
but how can a LogicalPlan be a tree? isnt it
if scala prior to sbt 2.10.4 didn't support java 8, does that mean that 3rd
party scala libraries compiled with a scala version < 2.10.4 might not work
on java 8?
On Mon, Mar 28, 2016 at 7:06 PM, Kostas Sakellis
wrote:
> Also, +1 on dropping jdk7 in Spark 2.0.
>
> Kostas
>
one of our unit tests broke with changes in spark 2.0 snapshot in last few
days (or maybe i simple missed it longer). i think it boils down to this:
val df1 = sc.makeRDD(1 to 3).toDF
val df2 = df1.map(row => Row(row(0).asInstanceOf[Int] +
1))(RowEncoder(df1.schema))
println(s"schema before
i think the arguments are convincing, but it also makes me wonder if i live
in some kind of alternate universe... we deploy on customers clusters,
where the OS, python version, java version and hadoop distro are not chosen
by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
side.
>
> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > i think the arguments are convincing, but it also makes me wonder if i
> live
> > in some kind of alternate universe... we deploy on customers clusters,
> where
> > the OS, pytho
i guess what i am saying is that in a yarn world the only hard restrictions
left are the the containers you run in, which means the hadoop version,
java version and python version (if you use python).
On Thu, Mar 24, 2016 at 12:39 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
mpatibility wrt Java)
>> Was there a proposal which did not go through ? Not sure if I missed it.
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i think that logic is reasonab
the good news is, that from an shared infrastructure perspective, most
places have zero scala, so the upgrade is actually very easy. i can see how
it would be different for say twitter
On Thu, Mar 24, 2016 at 7:50 PM, Reynold Xin wrote:
> If you want to go down that
> On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i guess what i am saying is that in a yarn world the only hard
>> restrictions left are the the containers you run in, which means the hadoop
>> version, java version and python version (if you u
i think that logic is reasonable, but then the same should also apply to
scala 2.10, which is also unmaintained/unsupported at this point (basically
has been since march 2015 except for one hotfix due to a license
incompatibility)
who wants to support scala 2.10 three years after they did the
we are not, but it seems reasonable to me that a user has the ability to
implement their own serializer.
can you refactor and break compatibility, but not make it private?
On Mon, Mar 7, 2016 at 9:57 PM, Josh Rosen wrote:
> Does anyone implement Spark's serializer
do i need to run sbt package before doing tests?
On Mon, Apr 4, 2016 at 11:00 PM, Marcelo Vanzin wrote:
> Hey all,
>
> We merged SPARK-13579 today, and if you're like me and have your
> hands automatically type "sbt assembly" anytime you're building Spark,
> that won't
rectly propagated to all nodes? Are they identical?
> Yes; these files are stored on a shared memory directory accessible to
> all nodes.
>
> Koert Kuipers:
> > we ran into similar issues and it seems related to the new memory
> > management. can you try:
> > spa
about that pro, i think it's more the opposite: many libraries have
stopped maintaining scala 2.10 versions. bugs will no longer be fixed for
scala 2.10 and new libraries will not be available for scala 2.10 at all,
making them unusable in spark.
take for example akka, a distributed messaging
Spark still runs on akka. So if you want the benefits of the latest akka
(not saying we do, was just an example) then you need to drop scala 2.10
On Mar 30, 2016 10:44 AM, "Cody Koeninger" wrote:
> I agree with Mark in that I don't see how supporting scala 2.10 for
> spark
Wed, Mar 30, 2016 at 9:10 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> Spark still runs on akka. So if you want the benefits of the latest akka
>> (not saying we do, was just an example) then you need to drop scala 2.10
>> On Mar 30, 2016 10:44 AM, "Cody Koe
tayed on an
>> old Scala version for multiple years because switching it, or mixing
>> versions, would affect the company's entire codebase.
>>
>> Matei
>>
>> On Mar 30, 2016, at 12:08 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> oh
we ran into similar issues and it seems related to the new memory
management. can you try:
spark.memory.useLegacyMode = true
On Mon, Apr 4, 2016 at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o
got it, but i assume thats an internal implementation detail, and it should
show null not -1?
On Tue, May 24, 2016 at 3:10 AM, Zhan Zhang wrote:
> The reason for "-1" is that the default value for Integer is -1 if the
> value
> is null
>
> def defaultValue(jt: String):
hello,
as we continue to test spark 2.0 SNAPSHOT in-house we ran into the
following trying to port an existing application from spark 1.6.1 to spark
2.0.0-SNAPSHOT.
given this code:
case class Test(a: Int, b: String)
val rdd = sc.parallelize(List(Row(List(Test(5, "ha"), Test(6, "ba")
val
https://issues.apache.org/jira/browse/SPARK-15507
On Tue, May 24, 2016 at 12:21 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Please log a JIRA.
>
> Thanks
>
> On Tue, May 24, 2016 at 8:33 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hello,
>> as we co
hello all, we are slowly expanding our test coverage for spark
2.0.0-SNAPSHOT to more in-house projects. today i ran into this issue...
this runs fine:
val df = sc.parallelize(List(("1", "2"), ("3", "4"))).toDF("a", "b")
df
.map(row => row)(RowEncoder(df.schema))
.select("a", "b")
.show
databricks.com> wrote:
>
>> It seems like the problem here is that we are not using unique names
>> for mapelements_isNull?
>>
>>
>>
>> On Tue, May 17, 2016 at 3:29 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> hello all, we are slowl
with the introduction of SparkSession SQLContext changed from being a lazy
val to a def.
however this is troublesome if you want to do:
import someDataset.sqlContext.implicits._
because it is no longer a stable identifier, i think? i get:
stable identifier required, but
;> org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1861)
>> at
>> org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1860)
>> at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2438)
>> at org.apache.spark.sql.Dataset.head(Dataset.scala:1860)
>&g
1 - 100 of 205 matches
Mail list logo