Re: Unsubscribe

2020-08-26 Thread Annabel Melongo
ow you unsubscribe.  See here for instructions:  https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Wed, Aug 26, 2020, 4:22 PM Annabel Melongo wrote: Please remove me from the mailing list This email contains confidential information of and is the copyright of Infomedia. It must

Unsubscribe

2020-08-26 Thread Annabel Melongo
Please remove me from the mailing list

unsubscribe

2019-12-14 Thread Annabel Melongo
unsubscribe

Re: DataFrame to read json and include raw Json in DataFrame

2016-12-29 Thread Annabel Melongo
t;, "fn") 3. Merge the two schemas and you'll get what you want. Thanks On Thursday, December 29, 2016 7:18 PM, Richard Xin wrote: thanks, I have seen this, but this doesn't cover my question. What I need is read json and include raw json as part of my dataframe.

Re: DataFrame to read json and include raw Json in DataFrame

2016-12-29 Thread Annabel Melongo
Richard, Below documentation will show you how to create a sparkSession and how to programmatically load data: Spark SQL and DataFrames - Spark 2.1.0 Documentation | | | Spark SQL and DataFrames - Spark 2.1.0 Documentation | | | On Thursday, December 29, 2016 5:16 PM, Ri

Re: trouble using eclipse to view spark source code

2016-01-18 Thread Annabel Melongo
Andy, This has nothing to do with Spark but I guess you don't have the proper Scala version. The version you're currently running doesn't recognize a method in Scala ArrayOps, namely:          scala.collection.mutable.ArrayOps.$colon$plus On Monday, January 18, 2016 7:53 PM, Andy Davidson

Re: pre-install 3-party Python package on spark cluster

2016-01-11 Thread Annabel Melongo
When you run spark submit in either client or cluster mode, you can either use the options --packages or -jars to automatically copy your packages to the worker machines. Thanks On Monday, January 11, 2016 12:52 PM, Andy Davidson wrote: I use https://code.google.com/p/parallel-ssh/ to

Re: Date Time Regression as Feature

2016-01-07 Thread Annabel Melongo
Or he can also transform the whole date into a string On Thursday, January 7, 2016 2:25 PM, Sujit Pal wrote: Hi Jorge, Maybe extract things like dd, mm, day of week, time of day from the datetime string and use them as features? -sujit On Thu, Jan 7, 2016 at 11:09 AM, Jorge Machado

Re: Spark job uses only one Worker

2016-01-07 Thread Annabel Melongo
Michael, I don't know what's your environment but if it's Cloudera, you should be able to see the link to your master in the Hue. Thanks On Thursday, January 7, 2016 5:03 PM, Michael Pisula wrote: I had tried several parameters, including --total-executor-cores, no effect. As for the

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Annabel Melongo
Vijay, Are you closing the fileinputstream at the end of each loop ( in.close())? My guess is those streams aren't close and thus the "too many open files" exception. On Tuesday, January 5, 2016 8:03 AM, Priya Ch wrote: Can some one throw light on this ? Regards,Padma Ch On Mon, Dec 2

Re: Is Spark 1.6 released?

2016-01-04 Thread Annabel Melongo
[1] http://spark.apache.org/releases/spark-release-1-6-0.html[2]  http://spark.apache.org/downloads.html On Monday, January 4, 2016 2:59 PM, "saif.a.ell...@wellsfargo.com" wrote: Where can I read more about the dataset api on a user layer? I am failing to get an API doc or understand

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
date them. I have filed SPARK-12565 to track this. Please let me know if there's anything else I can help clarify. Cheers,-Andrew 2015-12-29 13:07 GMT-08:00 Annabel Melongo : Andrew, Now I see where the confusion lays. Standalone cluster mode, your link, is nothing but a combination of clien

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
op of YARN. Please correct me, if I'm wrong. Thanks   On Tuesday, December 29, 2015 2:54 PM, Andrew Or wrote: http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications 2015-12-29 11:48 GMT-08:00 Annabel Melongo : Greg, Can you please send me a doc

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
client mode. 2015-12-29 11:32 GMT-08:00 Annabel Melongo : Greg, The confusion here is the expression "standalone cluster mode". Either it's stand-alone or it's cluster mode but it can't be both.  With this in mind, here's how jars are uploaded:    1. Spark Stand-alone m

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Annabel Melongo
Greg, The confusion here is the expression "standalone cluster mode". Either it's stand-alone or it's cluster mode but it can't be both.  With this in mind, here's how jars are uploaded:    1. Spark Stand-alone mode: client and driver run on the same machine; use --packages option to submit a ja

Re: Stuck with DataFrame df.select("select * from table");

2015-12-29 Thread Annabel Melongo
example is in Scala, then, I believe, semicolon is not required. -- Be well! Jean Morozov On Mon, Dec 28, 2015 at 8:49 PM, Annabel Melongo wrote: Jean, Try this:df.select("""select * from tmptable where x1 = '3.0'""").show(); Note: you have to use 3 dou

Re: DataFrame Vs RDDs ... Which one to use When ?

2015-12-28 Thread Annabel Melongo
Additionally, if you already have some legal sql statements to process said data, instead of reinventing the wheel using rdd's functions, you can speed up implementation by using dataframes along with these existing sql statements. On Monday, December 28, 2015 5:37 PM, Darren Govoni wrote

Re: Stuck with DataFrame df.select("select * from table");

2015-12-28 Thread Annabel Melongo
Jean, Try this:df.select("""select * from tmptable where x1 = '3.0'""").show(); Note: you have to use 3 double quotes as marked On Friday, December 25, 2015 11:30 AM, Eugene Morozov wrote: Thanks for the comments, although the issue is not in limit() predicate. It's something with spa

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
Phone On 7 Dec 2015, at 18:50, Annabel Melongo wrote: Jia, I'm so confused on this. The architecture of Spark is to run on top of HDFS. What you're requesting, reading and writing to a C++ process, is not part of that requirement. On Monday, December 7, 2015 1:42 PM, Jia wr

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
On Monday, December 7, 2015 1:57 PM, Robin East wrote: Annabel Spark works very well with data stored in HDFS but is certainly not tied to it. Have a look at the wide variety of connectors to things like Cassandra, HBase, etc. Robin Sent from my iPhone On 7 Dec 2015, at 18:50, Annabel Melong

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
I have no intention to write and run Spark UDF in C++, I'm just wondering whether Spark can read and write data to a C++ process with zero copy. Best Regards,Jia  On Dec 7, 2015, at 12:26 PM, Annabel Melongo wrote: My guess is that Jia wants to run C++ on top of Spark. If that's t

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm afraid this is not possible. Spark has support for Java, Python, Scala and R. The best way to achieve this is to run your application in C++ and used the data created by said application to do manipulation within Spark