point.sh used in the kubernetes packing)
- We can add a simple step to the init container trying to do the DNS
resolution and failing after 60s if it did not work
But these steps won't change the fact that the driver will stay stuck
thinking we're still in the case of the
ed on other vendors ? Also on
> the kubelet nodes did you notice any pressure on the DNS side?
>
> Li
>
>
> On Mon, Apr 29, 2019, 5:43 AM Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> I have ~300 spark job on Kubernetes (GKE)
I am also facing the same issue on my kubernetes
> cluster(v1.11.5) on AWS with spark version 2.3.3, any luck in figuring out
> the root cause?
>
> On Fri, May 3, 2019 at 5:37 AM Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi,
&
Hi everyone,
FYI the spark source download link on spark.apache.org is dead :
https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-sources.tgz
Regards,
--
*Olivier Girardot*
h .replace doesn't seem to have ever worked?
> https://github.com/apache/spark-website/pull/207
>
> On Tue, Jun 18, 2019 at 4:07 AM Olivier Girardot
> wrote:
> >
> > Hi everyone,
> > FYI the spark source download link on spark.apache.org is dead :
> >
Hi everyone,
Is there any known way to go from a Spark SQL Logical Plan (optimised ?)
Back to a SQL query ?
Regards,
Olivier.
gn instance of
scala.collection.immutable.List$SerializationProxy to field
org.apache.spark.rdd.RDD.org $apache$spark$rdd$RDD$$dependencies_ of type
scala.collection.Seq in instance
of org.apache.spark.rdd.MapPartitionsRDD
Olivier Girardot | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with
kryo.registrationRequired=true Is it even possible at all considering the
codegen ?
Regards,
Olivier Girardot | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
t; wrote:
Hi Olivier,
I don't know either, but am curious what you've tried already.
Jacek
On 3 Aug 2016 10:50 a.m., "Olivier Girardot" < o.girardot@lateral-thoughts. com
> wrote:
Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with
kryo. regis
aggExprs).map { pairExpr =>
strToExpr(pairExpr._2)(df(pairExpr._1).expr) }.toSeq) }
regards --
Ing. Ivaldi Andres
Olivier Girardot | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
ly a transformation on complex nested
datatypes (arrays and struct) on a Dataframe updating the value itself.
Regards,
Olivier Girardot
according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot
s there by any
chance publications of Spark 2.0.0 with different classifier according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot
Olivier Girardot| Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
, because the Hadoop APIs that are used are all the same across these
versions. That would be the thing that makes you need multiple versions of the
artifact under multiple classifiers.
On Wed, Sep 28, 2016 at 1:16 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
ok, don't
by computations, but that's bound to be inefficient
* or to generate bytecode using the schema
to do the nested "getRow,getSeq…" and re-create the rows once transformation
is applied
I'd like to open an issue regarding that use case because it's not the first or
last tim
evangelist. She has written a
few books on Spark, as well as frequent contributions to the Python API to
improve its usability and performance.
Please join me in welcoming the two!
Olivier Girardot| Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
lter pushdown issues with complex
> datatypes?
>
> Thanks!
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
JIRA or is there a workaround ?
Regards,
--
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
Hi,
this was not reproduced for me, what kind of jdk are you using for the zinc
server ?
Regards,
Olivier.
2015-02-11 5:08 GMT+01:00 Yi Tian :
> Hi, all
>
> I got an ERROR when I build spark master branch with maven (commit:
> 2d1e916730492f5d61b97da6c483d3223ca44315)
>
> [INFO]
> [INFO]
> --
ssion introduced by the 1.3.x DataFrame because
JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
not callable with JFunctions), I can open a Jira if you want ?
Regards,
--
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
Yes thanks !
Le ven. 17 avr. 2015 à 16:20, Ted Yu a écrit :
> The image didn't go through.
>
> I think you were referring to:
> override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>
> Cheers
>
> On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
Ok, do you want me to open a pull request to fix the dedicated
documentation ?
Le ven. 17 avr. 2015 à 18:14, Reynold Xin a écrit :
> I think in 1.3 and above, you'd need to do
>
> .sql(...).javaRDD().map(..)
>
> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot &l
Is there any convention *not* to show java 8 versions in the documentation ?
Le ven. 17 avr. 2015 à 21:39, Reynold Xin a écrit :
> Please do! Thanks.
>
>
> On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Ok,
ill more 7 users than 8.
>
>
> On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Is there any convention *not* to show java 8 versions in the
>> documentation ?
>>
>> Le ven. 17 avr. 2015 à 21:39, Reynold Xin
gt;
>
> On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Is there any convention *not* to show java 8 versions in the
>> documentation ?
>>
>> Le ven. 17 avr. 2015 à 21:39, Reynold Xin a écrit :
>>
>
Hi Nipun,
I'm sorry but I don't understand exactly what your problem is ?
Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
dependency.
Is it a compilation problem ?
Are you trying to run a main method using the pom you've just described ?
or are you trying to spark-submit the
nse.
>
> Thanks
> Nipun Batra
>
>
>
> On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot
> wrote:
>
>> Hi Nipun,
>> I'm sorry but I don't understand exactly what your problem is ?
>> Regarding the org.apache.spark.sql.Row, it does exists in th
Hi everyone,
let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API in
PySpark, is there any efficient alternative to mapping the records myself ?
Regards,
Olivier.
a UDF might be a good idea no ?
Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> Hi everyone,
> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna* API
> in PySpark, is there any efficient alternative to map
gt;
> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> a UDF might be a good idea no ?
>>
>> Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>> &g
Hi everyone,
It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly for
Hadoop 2.4 and later) didn't get deploy on all the mirrors and cloudfront.
Both the direct download and apache mirrors fails with dead links, for
example : http://d3kbcqa49mib13.cloudfront.net/spark-1.2.2-bin-h
ue, Apr 21, 2015 at 6:06 AM, Olivier Girardot
> wrote:
> > Hi everyone,
> > It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly
> for
> > Hadoop 2.4 and later) didn't get deploy on all the mirrors and
> cloudfront.
> > Both the direct down
Hi Sourav,
Can you post your updateFunc as well please ?
Regards,
Olivier.
Le mar. 21 avr. 2015 à 12:48, Sourav Chandra
a écrit :
> Hi,
>
> We are building a spark streaming application which reads from kafka, does
> updateStateBykey based on the received message type and finally stores into
>
Hi everyone,
I was just wandering about the Spark full build time (including tests),
1h48 seems to me quite... spacious. What's taking most of the time ? Is the
build mainly integration tests ? Is there any roadmap or jiras dedicated to
that we can chip in ?
Regards,
Olivier.
a écrit :
> It runs tons of integration tests. I think most developers just let
> Jenkins run the full suite of them.
>
> On Tue, Apr 21, 2015 at 12:54 PM, Olivier Girardot
> wrote:
>
>> Hi everyone,
>> I was just wandering about the Spark full build time (including
Reynold Xin a écrit :
>>
>>> You can just create fillna function based on the 1.3.1 implementation of
>>> fillna, no?
>>>
>>>
>>> On Mon, Apr 20, 2015 at 2:48 AM, Olivier Girardot <
>>> o.girar...@lateral-thoughts.com> wrote:
>
I think I found the Coalesce you were talking about, but this is a catalyst
class that I think is not available from pyspark
Regards,
Olivier.
Le mer. 22 avr. 2015 à 11:56, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> Where should this *coalesce* come from ? Is it r
Yep no problem, but I can't seem to find the coalesce fonction in
pyspark.sql.{*, functions, types or whatever :) }
Olivier.
Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> a UDF might be a good idea no ?
>
> Le lun. 20 avr. 2
yep :) I'll open the jira when I've got the time.
Thanks
Le jeu. 23 avr. 2015 à 19:31, Reynold Xin a écrit :
> Ah damn. We need to add it to the Python list. Would you like to give it a
> shot?
>
>
> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
> o.girar
What is the way of testing/building the pyspark part of Spark ?
Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> yep :) I'll open the jira when I've got the time.
> Thanks
>
> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin a é
ds an Array[Column] instead of just a list of arguments)
But this seems very specific and very prone to future mistakes.
Is there any way in Py4j to know before calling it the signature of a
method ?
Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
I'll try thanks
Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit :
> You can do it similar to the way countDistinct is done, can't you?
>
>
> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78
>
>
>
> On Thu, Apr 23, 2015 at 1:5
done : https://github.com/apache/spark/pull/5683 and
https://issues.apache.org/jira/browse/SPARK-7118
thx
Le ven. 24 avr. 2015 à 07:34, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> I'll try thanks
>
> Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit
Hi,
Is there any plan to add the "shift" method from Pandas to Spark Dataframe,
not that I think it's an easy task...
c.f.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html
Regards,
Olivier.
t any, then feel
> free to create a JIRA and make the case there for why this would be a good
> feature to add.
>
> Nick
>
> On Wed, Apr 29, 2015 at 7:30 AM Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi,
>> Is there any plan to add the
gt;
>>
>> On Wed, Apr 29, 2015 at 1:08 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>> > I can't comment on the direction of the DataFrame API (that's more for
>> > Reynold or Michael I guess), but I just wanted to p
I guess you can use cast(id as String) instead of just id in your where
clause ?
Le mer. 29 avr. 2015 à 12:13, lonely Feb a écrit :
> Hi all, we are transfer our HIVE job into SparkSQL, but we found a litter
> difference between HIVE and Spark SQL that our sql has a statement like:
>
> select A
Hi everyone,
SQLContext.createDataFrame has different behaviour in Scala or Python :
>>> l = [('Alice', 1)]
>>> sqlContext.createDataFrame(l).collect()
[Row(_1=u'Alice', _2=1)]
>>> sqlContext.createDataFrame(l, ['name', 'age']).collect()
[Row(name=u'Alice', age=1)]
and in Scala :
scala> val data
To close this thread rxin created a broader Jira to handle window functions
in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322
Thanks everyone.
Le mer. 29 avr. 2015 à 22:51, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> To give you a broader idea of th
itch between
> languages).
>
> On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> SQLContext.createDataFrame has different behaviour in Scala or Python :
>>
>> >>> l =
Hi everyone,
Is there any way in Spark SQL to load multi-line JSON data efficiently, I
think there was in the mailing list a reference to
http://pivotal-field-engineering.github.io/pmr-common/ for its
JSONInputFormat
But it's rather inaccessible considering the dependency is not available in
any p
s to scan from the beginning and parse the json properly, which
> makes it not possible with large files (doable for whole input with a lot
> of small files though). If there is a better way, we should do it.
>
>
> On Sun, May 3, 2015 at 1:04 PM, Olivier Girardot <
> o.girar...@l
ibrary:
>> >
>> > https://github.com/alexholmes/json-mapreduce
>> >
>> > --
>> > Emre Sevinç
>> >
>> >
>> > On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot <
>> > o.girar...@lateral-thoughts.com> wrote:
>> &
ddle
> of
> >>> a
> >>> string, and thus the first { might just be part of a string, rather
> than
> >>> a
> >>> real JSON object starting position.
> >>>
> >>>
> >>> On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc
> >
Hi everyone,
there seems to be different implementations of the "distinct" feature in
DataFrames and RDD and some performance issue with the DataFrame distinct
API.
In RDD.scala :
def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] =
withScope { map(x => (x, null)).reduceBy
use the
> Aggregate operator which will benefit from all the Tungsten optimizations,
> or have a Tungsten version of distinct for SQL/DataFrame.
>
> On Thu, May 7, 2015 at 1:32 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>>
You're trying to launch using sbt run some "provided" dependency,
the goal of the "provided" scope is exactly to exclude this dependency from
runtime, considering it as "provided" by the environment.
You configuration is correct to create an assembly jar - but not to use sbt
run to test your proje
arks that show better
> performance for both the "fits in memory case" and the "too big for memory
> case".
>
> On Thu, May 7, 2015 at 2:23 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Ok, but for the moment, this seems to be
that's a great idea !
Le mer. 13 mai 2015 à 07:38, Reynold Xin a écrit :
> I added @since version tag for all public dataframe/sql methods/classes in
> this patch: https://github.com/apache/spark/pull/6101/files
>
> From now on, if you merge anything related to DF/SQL, please make sure the
> pub
I've just tested the new window functions using PySpark in the Spark 1.4.0
rc2 distribution for hadoop 2.4 with and without hive support.
It works well with the hive support enabled distribution and fails as
expected on the other one (with an explicit error : "Could not resolve
window function 'le
Hi,
Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't
seem to accept a Column as input datatype :
*.join(only_the_best, only_the_best.pol_no == df.pol_no,
"inner").drop(only_the_best.pol_no)\* File
"/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", li
Actually, the Scala API too is only based on column name
Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> Hi,
> Testing a bit more 1.4, it seems that the .drop() method in PySpark
> doesn't seem to accept a Column as input datatyp
talog.
Regards,
Olivier.
Le sam. 30 mai 2015 à 09:54, Reynold Xin a écrit :
> Yea would be great to support a Column. Can you create a JIRA, and
> possibly a pull request?
>
>
> On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
&g
vier.
>>
>> Le sam. 30 mai 2015 à 09:54, Reynold Xin a écrit :
>>
>>> Yea would be great to support a Column. Can you create a JIRA, and
>>> possibly a pull request?
>>>
>>>
>>> On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
Hi everyone,
I think there's a blocker on PySpark the "when" functions in python seems
to be broken but the Scala API seems fine.
Here's a snippet demonstrating that with Spark 1.4.0 RC3 :
In [*1*]: df = sqlCtx.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1,
"2")], ["key", "value"])
In [*2*]:
Hi everyone,
Considering the python API as just a front needing the SPARK_HOME defined
anyway, I think it would be interesting to deploy the Python part of Spark
on PyPi in order to handle the dependencies in a Python project needing
PySpark via pip.
For now I just symlink the python/pyspark in my
t; This has been proposed before:
>> https://issues.apache.org/jira/browse/SPARK-1267
>>
>> There's currently tighter coupling between the Python and Java halves of
>> PySpark than just requiring SPARK_HOME to be set; if we did this, I bet
>> we'd run i
Hi everyone,
Using spark-ml there seems to be only BinaryClassificationEvaluator and
RegressionEvaluator, is there any way or plan to provide a ROC-based or
PR-based or F-Measure based for multi-class, I would be interested
especially in evaluating and doing a grid search for a RandomForest model.
sues.apache.org/jira/browse/SPARK-7690> is tracking work on
> this if you are interested in following the development.
>
> On Mon, Jul 13, 2015 at 2:16 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> Using spark-ml there seems to b
categorical value on multiple columns would be very useful.
Regards,
--
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
>
> df.groupBy(h, r:_*).count()
> }
>
> countByValueDf(df).show()
>
>
> Cheers,
> Jon
>
> On 20 July 2015 at 11:28, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi,
>> Is there any plan to add the countByValue function to
park-dataframes/
>>
>> This is a very common use case. If there is a +1 I would love to add it
>> to dataframes.
>>
>> Let me know
>> Ted Malaska
>>
>> On Tue, Jul 21, 2015 at 7:24 AM, Olivier Girardot <
>> o.girar...@lateral-thoughts.com>
t;>
>> On Tue, Jul 21, 2015 at 7:39 AM, Ted Malaska
>> wrote:
>>
>>> 100% I would love to do it. Who a good person to review the design
>>> with. All I need is a quick chat about the design and approach and I'll
>>> create the jira and pus
ss.com
> <http://burakisikli.wordpress.com>*
>
>
--
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
>
>> >> >> These are usable by adding this repository in your build and using a
>> >> >> snapshot version (e.g. 1.3.2-SNAPSHOT).
>> >> >>
>> >> >> 2. Nightly binary package builds and doc builds of master and
>> release
>> >> >> versions.
>> >> >>
>> >> >> http://people.apache.org/~pwendell/spark-nightly/
>> >> >>
>> >> >> These build 4 times per day and are tagged based on commits.
>> >> >>
>> >> >> If anyone has feedback on these please let me know.
>> >> >>
>> >> >> Thanks!
>> >> >> - Patrick
>> >> >>
>> >> >>
>> -
>> >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >> >>
>> >> >
>> >
>> >
>>
>
>
--
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
n$8$$anon$1.next(Window.scala:252)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
2015-08-26 11:47 GMT+02:00 Olivier Girardot :
> Hi everyone,
> I know this "post title" doesn't seem very logical and I agree,
> we have a very complex computation usin
sorry for the delay, yes still.
I'm still trying to figure out if it comes from bad data and trying to
isolate the bug itself...
2015-09-11 0:28 GMT+02:00 Reynold Xin :
> Does this still happen on 1.5.0 release?
>
>
> On Mon, Aug 31, 2015 at 9:31 AM, Olivier Girardot
> wr
77 matches
Mail list logo