Spark 2.4.1 on Kubernetes - DNS resolution of driver fails

2019-04-29 Thread Olivier Girardot
point.sh used in the kubernetes packing) - We can add a simple step to the init container trying to do the DNS resolution and failing after 60s if it did not work But these steps won't change the fact that the driver will stay stuck thinking we're still in the case of the

Re: Spark 2.4.1 on Kubernetes - DNS resolution of driver fails

2019-05-03 Thread Olivier Girardot
ed on other vendors ? Also on > the kubelet nodes did you notice any pressure on the DNS side? > > Li > > > On Mon, Apr 29, 2019, 5:43 AM Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi everyone, >> I have ~300 spark job on Kubernetes (GKE)

Re: [External Sender] Re: Spark 2.4.1 on Kubernetes - DNS resolution of driver fails

2019-06-18 Thread Olivier Girardot
I am also facing the same issue on my kubernetes > cluster(v1.11.5) on AWS with spark version 2.3.3, any luck in figuring out > the root cause? > > On Fri, May 3, 2019 at 5:37 AM Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi, &

Spark 2.4.3 source download is a dead link

2019-06-18 Thread Olivier Girardot
Hi everyone, FYI the spark source download link on spark.apache.org is dead : https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-sources.tgz Regards, -- *Olivier Girardot*

Re: Spark 2.4.3 source download is a dead link

2019-06-24 Thread Olivier Girardot
h .replace doesn't seem to have ever worked? > https://github.com/apache/spark-website/pull/207 > > On Tue, Jun 18, 2019 at 4:07 AM Olivier Girardot > wrote: > > > > Hi everyone, > > FYI the spark source download link on spark.apache.org is dead : > >

Back to SQL

2018-10-03 Thread Olivier Girardot
Hi everyone, Is there any known way to go from a Spark SQL Logical Plan (optimised ?) Back to a SQL query ? Regards, Olivier.

Re: tpcds for spark2.0

2016-07-29 Thread Olivier Girardot
gn instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org $apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD Olivier Girardot | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Spark SQL and Kryo registration

2016-08-03 Thread Olivier Girardot
Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with kryo.registrationRequired=true Is it even possible at all considering the codegen ? Regards, Olivier Girardot | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: Spark SQL and Kryo registration

2016-08-17 Thread Olivier Girardot
t; wrote: Hi Olivier, I don't know either, but am curious what you've tried already. Jacek On 3 Aug 2016 10:50 a.m., "Olivier Girardot" < o.girardot@lateral-thoughts. com > wrote: Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with kryo. regis

Re: Aggregations with scala pairs

2016-08-17 Thread Olivier Girardot
aggExprs).map { pairExpr => strToExpr(pairExpr._2)(df(pairExpr._1).expr) }.toSeq) } regards -- Ing. Ivaldi Andres Olivier Girardot | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Spark SQL - Applying transformation on a struct inside an array

2016-09-13 Thread Olivier Girardot
ly a transformation on complex nested datatypes (arrays and struct) on a Dataframe updating the value itself. Regards, Olivier Girardot

Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-21 Thread Olivier Girardot
according to different versions of Hadoop available ? Thanks for your time ! Olivier Girardot

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-28 Thread Olivier Girardot
s there by any chance publications of Spark 2.0.0 with different classifier according to different versions of Hadoop available ? Thanks for your time ! Olivier Girardot Olivier Girardot| Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: Using Spark as a Maven dependency but with Hadoop 2.6

2016-09-29 Thread Olivier Girardot
, because the Hadoop APIs that are used are all the same across these versions. That would be the thing that makes you need multiple versions of the artifact under multiple classifiers. On Wed, Sep 28, 2016 at 1:16 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: ok, don't

Re: Spark SQL - Applying transformation on a struct inside an array

2017-01-05 Thread Olivier Girardot
by computations, but that's bound to be inefficient * or to generate bytecode using the schema to do the nested "getRow,getSeq…" and re-create the rows once transformation is applied I'd like to open an issue regarding that use case because it's not the first or last tim

Re: welcoming Burak and Holden as committers

2017-01-28 Thread Olivier Girardot
evangelist. She has written a few books on Spark, as well as frequent contributions to the Python API to improve its usability and performance. Please join me in welcoming the two! Olivier Girardot| Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: Will higher order functions in spark SQL be pushed upstream?

2017-06-09 Thread Olivier Girardot
lter pushdown issues with complex > datatypes? > > Thanks! > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Nested "struct" fonction call creates a compilation error in Spark SQL

2017-06-15 Thread Olivier Girardot
JIRA or is there a workaround ? Regards, -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com

Re: Build spark failed with maven

2015-02-14 Thread Olivier Girardot
Hi, this was not reproduced for me, what kind of jdk are you using for the zinc server ? Regards, Olivier. 2015-02-11 5:08 GMT+01:00 Yi Tian : > Hi, all > > I got an ERROR when I build spark master branch with maven (commit: > 2d1e916730492f5d61b97da6c483d3223ca44315) > > [INFO] > [INFO] > --

[Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
ssion introduced by the 1.3.x DataFrame because JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not callable with JFunctions), I can open a Jira if you want ? Regards, -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Yes thanks ! Le ven. 17 avr. 2015 à 16:20, Ted Yu a écrit : > The image didn't go through. > > I think you were referring to: > override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f) > > Cheers > > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Ok, do you want me to open a pull request to fix the dedicated documentation ? Le ven. 17 avr. 2015 à 18:14, Reynold Xin a écrit : > I think in 1.3 and above, you'd need to do > > .sql(...).javaRDD().map(..) > > On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot &l

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
Is there any convention *not* to show java 8 versions in the documentation ? Le ven. 17 avr. 2015 à 21:39, Reynold Xin a écrit : > Please do! Thanks. > > > On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Ok,

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
ill more 7 users than 8. > > > On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Is there any convention *not* to show java 8 versions in the >> documentation ? >> >> Le ven. 17 avr. 2015 à 21:39, Reynold Xin

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

2015-04-17 Thread Olivier Girardot
gt; > > On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Is there any convention *not* to show java 8 versions in the >> documentation ? >> >> Le ven. 17 avr. 2015 à 21:39, Reynold Xin a écrit : >> >

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Olivier Girardot
Hi Nipun, I'm sorry but I don't understand exactly what your problem is ? Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL dependency. Is it a compilation problem ? Are you trying to run a main method using the pom you've just described ? or are you trying to spark-submit the

Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-18 Thread Olivier Girardot
nse. > > Thanks > Nipun Batra > > > > On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot > wrote: > >> Hi Nipun, >> I'm sorry but I don't understand exactly what your problem is ? >> Regarding the org.apache.spark.sql.Row, it does exists in th

Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
Hi everyone, let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API in PySpark, is there any efficient alternative to mapping the records myself ? Regards, Olivier.

Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
a UDF might be a good idea no ? Le lun. 20 avr. 2015 à 11:17, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Hi everyone, > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API > in PySpark, is there any efficient alternative to map

Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
gt; > On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> a UDF might be a good idea no ? >> >> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot < >> o.girar...@lateral-thoughts.com> a écrit : >> >> &g

Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Olivier Girardot
Hi everyone, It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly for Hadoop 2.4 and later) didn't get deploy on all the mirrors and cloudfront. Both the direct download and apache mirrors fails with dead links, for example : http://d3kbcqa49mib13.cloudfront.net/spark-1.2.2-bin-h

Re: Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Olivier Girardot
ue, Apr 21, 2015 at 6:06 AM, Olivier Girardot > wrote: > > Hi everyone, > > It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly > for > > Hadoop 2.4 and later) didn't get deploy on all the mirrors and > cloudfront. > > Both the direct down

Re: Spark Streaming updatyeStateByKey throws OutOfMemory Error

2015-04-21 Thread Olivier Girardot
Hi Sourav, Can you post your updateFunc as well please ? Regards, Olivier. Le mar. 21 avr. 2015 à 12:48, Sourav Chandra a écrit : > Hi, > > We are building a spark streaming application which reads from kafka, does > updateStateBykey based on the received message type and finally stores into >

Spark build time

2015-04-21 Thread Olivier Girardot
Hi everyone, I was just wandering about the Spark full build time (including tests), 1h48 seems to me quite... spacious. What's taking most of the time ? Is the build mainly integration tests ? Is there any roadmap or jiras dedicated to that we can chip in ? Regards, Olivier.

Re: Spark build time

2015-04-22 Thread Olivier Girardot
a écrit : > It runs tons of integration tests. I think most developers just let > Jenkins run the full suite of them. > > On Tue, Apr 21, 2015 at 12:54 PM, Olivier Girardot > wrote: > >> Hi everyone, >> I was just wandering about the Spark full build time (including

Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Olivier Girardot
Reynold Xin a écrit : >> >>> You can just create fillna function based on the 1.3.1 implementation of >>> fillna, no? >>> >>> >>> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot < >>> o.girar...@lateral-thoughts.com> wrote: >

Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Olivier Girardot
I think I found the Coalesce you were talking about, but this is a catalyst class that I think is not available from pyspark Regards, Olivier. Le mer. 22 avr. 2015 à 11:56, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Where should this *coalesce* come from ? Is it r

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
Yep no problem, but I can't seem to find the coalesce fonction in pyspark.sql.{*, functions, types or whatever :) } Olivier. Le lun. 20 avr. 2015 à 11:48, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > a UDF might be a good idea no ? > > Le lun. 20 avr. 2

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
yep :) I'll open the jira when I've got the time. Thanks Le jeu. 23 avr. 2015 à 19:31, Reynold Xin a écrit : > Ah damn. We need to add it to the Python list. Would you like to give it a > shot? > > > On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot < > o.girar

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
What is the way of testing/building the pyspark part of Spark ? Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > yep :) I'll open the jira when I've got the time. > Thanks > > Le jeu. 23 avr. 2015 à 19:31, Reynold Xin a é

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
ds an Array[Column] instead of just a list of arguments) But this seems very specific and very prone to future mistakes. Is there any way in Py4j to know before calling it the signature of a method ? Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit :

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
I'll try thanks Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit : > You can do it similar to the way countDistinct is done, can't you? > > > https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78 > > > > On Thu, Apr 23, 2015 at 1:5

Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Olivier Girardot
done : https://github.com/apache/spark/pull/5683 and https://issues.apache.org/jira/browse/SPARK-7118 thx Le ven. 24 avr. 2015 à 07:34, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > I'll try thanks > > Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit

Pandas' Shift in Dataframe

2015-04-29 Thread Olivier Girardot
Hi, Is there any plan to add the "shift" method from Pandas to Spark Dataframe, not that I think it's an easy task... c.f. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html Regards, Olivier.

Re: Pandas' Shift in Dataframe

2015-04-29 Thread Olivier Girardot
t any, then feel > free to create a JIRA and make the case there for why this would be a good > feature to add. > > Nick > > On Wed, Apr 29, 2015 at 7:30 AM Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi, >> Is there any plan to add the

Re: Pandas' Shift in Dataframe

2015-04-29 Thread Olivier Girardot
gt; >> >> On Wed, Apr 29, 2015 at 1:08 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >> > I can't comment on the direction of the DataFrame API (that's more for >> > Reynold or Michael I guess), but I just wanted to p

Re: Spark SQL cannot tolerate regexp with BIGINT

2015-04-29 Thread Olivier Girardot
I guess you can use cast(id as String) instead of just id in your where clause ? Le mer. 29 avr. 2015 à 12:13, lonely Feb a écrit : > Hi all, we are transfer our HIVE job into SparkSQL, but we found a litter > difference between HIVE and Spark SQL that our sql has a statement like: > > select A

createDataFrame allows column names as second param in Python not in Scala

2015-05-02 Thread Olivier Girardot
Hi everyone, SQLContext.createDataFrame has different behaviour in Scala or Python : >>> l = [('Alice', 1)] >>> sqlContext.createDataFrame(l).collect() [Row(_1=u'Alice', _2=1)] >>> sqlContext.createDataFrame(l, ['name', 'age']).collect() [Row(name=u'Alice', age=1)] and in Scala : scala> val data

Re: Pandas' Shift in Dataframe

2015-05-02 Thread Olivier Girardot
To close this thread rxin created a broader Jira to handle window functions in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322 Thanks everyone. Le mer. 29 avr. 2015 à 22:51, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > To give you a broader idea of th

Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-03 Thread Olivier Girardot
itch between > languages). > > On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi everyone, >> SQLContext.createDataFrame has different behaviour in Scala or Python : >> >> >>> l =

Multi-Line JSON in SparkSQL

2015-05-03 Thread Olivier Girardot
Hi everyone, Is there any way in Spark SQL to load multi-line JSON data efficiently, I think there was in the mailing list a reference to http://pivotal-field-engineering.github.io/pmr-common/ for its JSONInputFormat But it's rather inaccessible considering the dependency is not available in any p

Re: Multi-Line JSON in SparkSQL

2015-05-03 Thread Olivier Girardot
s to scan from the beginning and parse the json properly, which > makes it not possible with large files (doable for whole input with a lot > of small files though). If there is a better way, we should do it. > > > On Sun, May 3, 2015 at 1:04 PM, Olivier Girardot < > o.girar...@l

Re: Multi-Line JSON in SparkSQL

2015-05-04 Thread Olivier Girardot
ibrary: >> > >> > https://github.com/alexholmes/json-mapreduce >> > >> > -- >> > Emre Sevinç >> > >> > >> > On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot < >> > o.girar...@lateral-thoughts.com> wrote: >> &

Re: Multi-Line JSON in SparkSQL

2015-05-04 Thread Olivier Girardot
ddle > of > >>> a > >>> string, and thus the first { might just be part of a string, rather > than > >>> a > >>> real JSON object starting position. > >>> > >>> > >>> On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc > >

DataFrame distinct vs RDD distinct

2015-05-07 Thread Olivier Girardot
Hi everyone, there seems to be different implementations of the "distinct" feature in DataFrames and RDD and some performance issue with the DataFrame distinct API. In RDD.scala : def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { map(x => (x, null)).reduceBy

Re: DataFrame distinct vs RDD distinct

2015-05-07 Thread Olivier Girardot
use the > Aggregate operator which will benefit from all the Tungsten optimizations, > or have a Tungsten version of distinct for SQL/DataFrame. > > On Thu, May 7, 2015 at 1:32 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi everyone, >>

Re: NoClassDefFoundError with Spark 1.3

2015-05-08 Thread Olivier Girardot
You're trying to launch using sbt run some "provided" dependency, the goal of the "provided" scope is exactly to exclude this dependency from runtime, considering it as "provided" by the environment. You configuration is correct to create an assembly jar - but not to use sbt run to test your proje

Re: DataFrame distinct vs RDD distinct

2015-05-08 Thread Olivier Girardot
arks that show better > performance for both the "fits in memory case" and the "too big for memory > case". > > On Thu, May 7, 2015 at 2:23 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Ok, but for the moment, this seems to be

Re: @since version tag for all dataframe/sql methods

2015-05-13 Thread Olivier Girardot
that's a great idea ! Le mer. 13 mai 2015 à 07:38, Reynold Xin a écrit : > I added @since version tag for all public dataframe/sql methods/classes in > this patch: https://github.com/apache/spark/pull/6101/files > > From now on, if you merge anything related to DF/SQL, please make sure the > pub

Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Olivier Girardot
I've just tested the new window functions using PySpark in the Spark 1.4.0 rc2 distribution for hadoop 2.4 with and without hive support. It works well with the hive support enabled distribution and fails as expected on the other one (with an explicit error : "Could not resolve window function 'le

Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Hi, Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't seem to accept a Column as input datatype : *.join(only_the_best, only_the_best.pol_no == df.pol_no, "inner").drop(only_the_best.pol_no)\* File "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", li

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Actually, the Scala API too is only based on column name Le ven. 29 mai 2015 à 11:23, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Hi, > Testing a bit more 1.4, it seems that the .drop() method in PySpark > doesn't seem to accept a Column as input datatyp

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-30 Thread Olivier Girardot
talog. Regards, Olivier. Le sam. 30 mai 2015 à 09:54, Reynold Xin a écrit : > Yea would be great to support a Column. Can you create a JIRA, and > possibly a pull request? > > > On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > &g

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-31 Thread Olivier Girardot
vier. >> >> Le sam. 30 mai 2015 à 09:54, Reynold Xin a écrit : >> >>> Yea would be great to support a Column. Can you create a JIRA, and >>> possibly a pull request? >>> >>> >>> On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-02 Thread Olivier Girardot
Hi everyone, I think there's a blocker on PySpark the "when" functions in python seems to be broken but the Scala API seems fine. Here's a snippet demonstrating that with Spark 1.4.0 RC3 : In [*1*]: df = sqlCtx.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], ["key", "value"]) In [*2*]:

PySpark on PyPi

2015-06-04 Thread Olivier Girardot
Hi everyone, Considering the python API as just a front needing the SPARK_HOME defined anyway, I think it would be interesting to deploy the Python part of Spark on PyPi in order to handle the dependencies in a Python project needing PySpark via pip. For now I just symlink the python/pyspark in my

Re: PySpark on PyPi

2015-06-05 Thread Olivier Girardot
t; This has been proposed before: >> https://issues.apache.org/jira/browse/SPARK-1267 >> >> There's currently tighter coupling between the Python and Java halves of >> PySpark than just requiring SPARK_HOME to be set; if we did this, I bet >> we'd run i

RandomForest evaluator for grid search

2015-07-13 Thread Olivier Girardot
Hi everyone, Using spark-ml there seems to be only BinaryClassificationEvaluator and RegressionEvaluator, is there any way or plan to provide a ROC-based or PR-based or F-Measure based for multi-class, I would be interested especially in evaluating and doing a grid search for a RandomForest model.

Re: RandomForest evaluator for grid search

2015-07-13 Thread Olivier Girardot
sues.apache.org/jira/browse/SPARK-7690> is tracking work on > this if you are interested in following the development. > > On Mon, Jul 13, 2015 at 2:16 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi everyone, >> Using spark-ml there seems to b

countByValue on dataframe with multiple columns

2015-07-20 Thread Olivier Girardot
categorical value on multiple columns would be very useful. Regards, -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: countByValue on dataframe with multiple columns

2015-07-21 Thread Olivier Girardot
> > df.groupBy(h, r:_*).count() > } > > countByValueDf(df).show() > > > Cheers, > Jon > > On 20 July 2015 at 11:28, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Hi, >> Is there any plan to add the countByValue function to

Re: countByValue on dataframe with multiple columns

2015-07-21 Thread Olivier Girardot
park-dataframes/ >> >> This is a very common use case. If there is a +1 I would love to add it >> to dataframes. >> >> Let me know >> Ted Malaska >> >> On Tue, Jul 21, 2015 at 7:24 AM, Olivier Girardot < >> o.girar...@lateral-thoughts.com>

Re: countByValue on dataframe with multiple columns

2015-07-21 Thread Olivier Girardot
t;> >> On Tue, Jul 21, 2015 at 7:39 AM, Ted Malaska >> wrote: >> >>> 100% I would love to do it. Who a good person to review the design >>> with. All I need is a quick chat about the design and approach and I'll >>> create the jira and pus

Re: Spark CBO

2015-07-31 Thread Olivier Girardot
ss.com > <http://burakisikli.wordpress.com>* > > -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-16 Thread Olivier Girardot
> >> >> >> These are usable by adding this repository in your build and using a >> >> >> snapshot version (e.g. 1.3.2-SNAPSHOT). >> >> >> >> >> >> 2. Nightly binary package builds and doc builds of master and >> release >> >> >> versions. >> >> >> >> >> >> http://people.apache.org/~pwendell/spark-nightly/ >> >> >> >> >> >> These build 4 times per day and are tagged based on commits. >> >> >> >> >> >> If anyone has feedback on these please let me know. >> >> >> >> >> >> Thanks! >> >> >> - Patrick >> >> >> >> >> >> >> - >> >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> >> >> > >> > >> > >> > > -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: ClassCastException using DataFrame only when num-executors > 2 ...

2015-08-31 Thread Olivier Girardot
n$8$$anon$1.next(Window.scala:252) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 2015-08-26 11:47 GMT+02:00 Olivier Girardot : > Hi everyone, > I know this "post title" doesn't seem very logical and I agree, > we have a very complex computation usin

Re: ClassCastException using DataFrame only when num-executors > 2 ...

2015-09-26 Thread Olivier Girardot
sorry for the delay, yes still. I'm still trying to figure out if it comes from bad data and trying to isolate the bug itself... 2015-09-11 0:28 GMT+02:00 Reynold Xin : > Does this still happen on 1.5.0 release? > > > On Mon, Aug 31, 2015 at 9:31 AM, Olivier Girardot > wr