I think this is likely something that we'll want to do during the code
generation phase. Though its probably not the lowest hanging fruit at this
point.
On Sun, May 31, 2015 at 5:02 AM, Reynold Xin wrote:
> I think you are looking for
> http://en.wikipedia.org/wiki/Common_subexpression_eliminat
I solved the problem.
It was caused by using spark-core_2.11 mvn repository.
When I compiled with spark-core_2.10, the problem doesn't show up again.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/problem-with-using-mapPartitions-tp12514p12523.ht
FYI we merged a patch that improves unit test log debugging. In order for
that to work, all test suites have been changed to extend SparkFunSuite
instead of ScalaTest's FunSuite. We also added a rule in the Scala style
checker to fail Jenkins if FunSuite is used.
The patch that introduced SparkFun
+1 (non-binding, of course)
1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -DskipTests
2. Tested pyspark, mlib - running as well as compare results with 1.3.1
2.1. statistics (min,max,mean,Pearson,Spe
Thank you for your reply.
But the typo is not reason for the problem.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/problem-with-using-mapPartitions-tp12514p12520.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
We added all the typetags for arguments but haven't got around to use them
yet. I think it'd make sense to have them and do the auto cast, but we can
have rules in analysis to forbid certain casts (e.g. don't auto cast double
to int).
On Sat, May 30, 2015 at 7:12 AM, Justin Uang wrote:
> The id
I think you are looking for
http://en.wikipedia.org/wiki/Common_subexpression_elimination in the
optimizer.
One thing to note is that as we do more and more optimization like this,
the optimization time might increase. Do you see a case where this can
bring you substantial performance gains?
On
Name resolution is not as easy I think. Wenchen can maybe give you some
advice on resolution about this one.
On Sat, May 30, 2015 at 9:37 AM, Yijie Shen
wrote:
> I think just match the Column’s expr as UnresolvedAttribute and use
> UnresolvedAttribute’s name to match schema’s field name is eno
bq. val result = fDB.mappartitions(testMP).collect
Not sure if you pasted the above code - there was a typo: method name
should be mapPartitions
Cheers
On Sat, May 30, 2015 at 9:44 AM, unioah wrote:
> Hi,
>
> I try to aggregate the value in each partition internally.
> For example,
>
> Bef
Thanks Josh and Yin.
Created following JIRA for the same :-
https://issues.apache.org/jira/browse/SPARK-7970
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466p12515.html
Sent from the
Hi,
I try to aggregate the value in each partition internally.
For example,
Before:
worker 1:worker 2:
1, 2, 1 2, 1, 2
After:
worker 1: worker 2:
(1->2), (2->1) (1->1), (2->2)
I try to use mappartitions,
object MyTest {
I think just match the Column’s expr as UnresolvedAttribute and use
UnresolvedAttribute’s name to match schema’s field name is enough.
Seems no need to regard expr as a more general one. :)
On May 30, 2015 at 11:14:05 PM, Girardot Olivier
(o.girar...@lateral-thoughts.com) wrote:
Jira done : ht
On second thought, perhaps can this be done by writing a rule that builds
the dag of dependencies between expressions, then convert it into several
layers of projections, where each new layer is allowed to depend on
expression results from previous projections?
Are there any pitfalls to this appro
If I do the following
df2 = df.withColumn('y', df['x'] * 7)
df3 = df2.withColumn('z', df2.y * 3)
df3.explain()
Then the result is
> Project [date#56,id#57,timestamp#58,x#59,(x#59 * 7.0) AS y#64,((x#59
* 7.0) AS y#64 * 3.0) AS z#65]
> PhysicalRDD [date#56,id#57,timestamp#58,x
Jira done : https://issues.apache.org/jira/browse/SPARK-7969
I've already started working on it but it's less trivial than it seems
because I don't exactly now the inner workings of the catalog,
and how to get the qualified name of a column to match it against the
schema/catalog.
Regards,
Olivier
The idea of asking for both the argument and return class is interesting. I
don't think we do that for the scala APIs currently, right? In
functions.scala, we only use the TypeTag for RT.
def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT]):
UserDefinedFunction = {
UserDefinedFunction(f,
No 1.4.0 Blockers at this point, which is great. Forking this thread
to discuss something else.
There are 92 issues targeted for 1.4.0, 28 of which are marked
Critical. Many are procedural issues like "update docs for 1.4" or
"check X for 1.4". Are these resolved? They sound like things that are
d
I downloaded source tar ball and ran command similar to following with:
clean package -DskipTests
Then I ran the following command.
Fyi
> On May 30, 2015, at 12:42 AM, Tathagata Das wrote:
>
> Did was it a clean compilation?
>
> TD
>
>> On Fri, May 29, 2015 at 10:48 PM, Ted Yu wrote:
>
Yea would be great to support a Column. Can you create a JIRA, and possibly
a pull request?
On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
> Actually, the Scala API too is only based on column name
>
> Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
>
Did was it a clean compilation?
TD
On Fri, May 29, 2015 at 10:48 PM, Ted Yu wrote:
> Hi,
> I ran the following command on 1.4.0 RC3:
>
> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package
>
> I saw the following failure:
>
> ^[[32mStreamingContextSuite:^[[0m
> ^[[32m- from no conf co
I think you are right that there is no way to call Java UDF without
registration right now. Adding another 20 methods to functions would be
scary. Maybe the best way is to have a companion object
for UserDefinedFunction, and define UDF there?
e.g.
object UserDefinedFunction {
def define(f: org
21 matches
Mail list logo