Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Reynold Xin
The changes look good to me. Jenkins is somehow not responding. Will merge once Jenkins comes back happy. On Fri, Apr 24, 2015 at 2:38 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > done : https://github.com/apache/spark/pull/5683 and > https://issues.apache.org/jira/browse/SPA

Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Olivier Girardot
done : https://github.com/apache/spark/pull/5683 and https://issues.apache.org/jira/browse/SPARK-7118 thx Le ven. 24 avr. 2015 à 07:34, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > I'll try thanks > > Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit : > >> You can do it simil

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
I'll try thanks Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit : > You can do it similar to the way countDistinct is done, can't you? > > > https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78 > > > > On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot < > o.girar...@

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Reynold Xin
You can do it similar to the way countDistinct is done, can't you? https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78 On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > I found another way setting a SPARK_HOME on a release

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
I found another way setting a SPARK_HOME on a released version and launching an ipython to load the contexts. I may need your insight however, I found why it hasn't been done at the same time, this method (like some others) uses a varargs in Scala and for now the way functions are called only one p

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Reynold Xin
You need to first have the Spark assembly jar built with "sbt/sbt assembly/assembly" Then usually I go into python/run-tests and comment out the non-SQL tests: #run_core_tests run_sql_tests #run_mllib_tests #run_ml_tests #run_streaming_tests And then you can run "python/run-tests" On Thu, Ap

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
What is the way of testing/building the pyspark part of Spark ? Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > yep :) I'll open the jira when I've got the time. > Thanks > > Le jeu. 23 avr. 2015 à 19:31, Reynold Xin a écrit : > >> Ah damn. We need t

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
yep :) I'll open the jira when I've got the time. Thanks Le jeu. 23 avr. 2015 à 19:31, Reynold Xin a écrit : > Ah damn. We need to add it to the Python list. Would you like to give it a > shot? > > > On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: >

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Reynold Xin
Ah damn. We need to add it to the Python list. Would you like to give it a shot? On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Yep no problem, but I can't seem to find the coalesce fonction in > pyspark.sql.{*, functions, types or whatever :) } > >

Re: Dataframe.fillna from 1.3.0

2015-04-23 Thread Olivier Girardot
Yep no problem, but I can't seem to find the coalesce fonction in pyspark.sql.{*, functions, types or whatever :) } Olivier. Le lun. 20 avr. 2015 à 11:48, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > a UDF might be a good idea no ? > > Le lun. 20 avr. 2015 à 11:17, Olivier Gir

Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Reynold Xin
It is actually different. coalesce expression is to pick the first value that is not null: https://msdn.microsoft.com/en-us/library/ms190349.aspx Would be great to update the documentation for it (both Scala and Java) to explain that it is different from coalesce function on a DataFrame/RDD. Do y

Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Olivier Girardot
I think I found the Coalesce you were talking about, but this is a catalyst class that I think is not available from pyspark Regards, Olivier. Le mer. 22 avr. 2015 à 11:56, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Where should this *coalesce* come from ? Is it related to

Re: Dataframe.fillna from 1.3.0

2015-04-22 Thread Olivier Girardot
Where should this *coalesce* come from ? Is it related to the partition manipulation coalesce method ? Thanks ! Le lun. 20 avr. 2015 à 22:48, Reynold Xin a écrit : > Ah ic. You can do something like > > > df.select(coalesce(df("a"), lit(0.0))) > > On Mon, Apr 20, 2015 at 1:44 PM, Olivier Girardo

Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Reynold Xin
Ah ic. You can do something like df.select(coalesce(df("a"), lit(0.0))) On Mon, Apr 20, 2015 at 1:44 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > From PySpark it seems to me that the fillna is relying on Java/Scala code, > that's why I was wondering. > Thank you for answerin

Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
>From PySpark it seems to me that the fillna is relying on Java/Scala code, that's why I was wondering. Thank you for answering :) Le lun. 20 avr. 2015 à 22:22, Reynold Xin a écrit : > You can just create fillna function based on the 1.3.1 implementation of > fillna, no? > > > On Mon, Apr 20, 20

Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Reynold Xin
You can just create fillna function based on the 1.3.1 implementation of fillna, no? On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > a UDF might be a good idea no ? > > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot < > o.girar...@lateral-thoughts.co

Re: Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
a UDF might be a good idea no ? Le lun. 20 avr. 2015 à 11:17, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Hi everyone, > let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API > in PySpark, is there any efficient alternative to mapping the records > myself ?

Dataframe.fillna from 1.3.0

2015-04-20 Thread Olivier Girardot
Hi everyone, let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API in PySpark, is there any efficient alternative to mapping the records myself ? Regards, Olivier.