Re: Broadcast Variable Life Cycle

2016-08-30 Thread Jerry Lam
troy permanently closes the > broadcast. > > On Tue, Aug 30, 2016 at 4:43 PM, Jerry Lam <chiling...@gmail.com> wrote: > > Hi Sean, > > > > Thank you for the response. The only problem is that actively managing > > broadcast variables require to return the broadca

Re: Broadcast Variable Life Cycle

2016-08-30 Thread Jerry Lam
ed when the > reference on the driver is garbage collected, but you usually would > not want to rely on that. > > On Mon, Aug 29, 2016 at 4:30 PM, Jerry Lam <chiling...@gmail.com> wrote: > > Hello spark developers, > > > > Anyone can shed some lights on the life cyc

Re: Broadcast Variable Life Cycle

2016-08-29 Thread Jerry Lam
. Regards, Jerry On Sun, Aug 21, 2016 at 1:07 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hello spark developers, > > Can someone explain to me what is the lifecycle of a broadcast variable? > When a broadcast variable will be garbage-collected at the driver-side and

Broadcast Variable Life Cycle

2016-08-21 Thread Jerry Lam
Hello spark developers, Can someone explain to me what is the lifecycle of a broadcast variable? When a broadcast variable will be garbage-collected at the driver-side and at the executor-side? Does a spark application need to actively manage the broadcast variables to ensure that it will not run

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-02 Thread Jerry Lam
they learn SQL-like languages. Do business schools teach SQL?? Best Regards, Jerry On Wed, Mar 2, 2016 at 10:03 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > > On 1 Mar 2016, at 22:25, Jerry Lam <chiling...@gmail.com> wrote: > > > > Hi Reynold, > > >

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
ar 1, 2016 at 9:35 AM, Alex Kozlov <ale...@gmail.com> wrote: > >> Looked at the paper: while we can argue on the performance side, I think >> semantically the Scala pattern matching is much more expressive. The time >> will decide. >> >> On Tue, Mar 1, 2016 at 9:

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
r. > > https://issues.apache.org/jira/browse/FLINK-3215 > > On 1 March 2016 at 08:19, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi Herman, >> >> Thank you for your reply! >> This functionality usually finds its place in financial services which >>

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
looks like some sort of a window function with very awkward syntax. I think > spark provided better constructs for this using dataframes/datasets/nested > data... > > Feel free to submit a PR. > > Kind regards, > > Herman van Hövell > > 2016-03-01 15:16 GMT+0

SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
Hi Spark developers, Will you consider to add support for implementing "Pattern matching in sequences of rows"? More specifically, I'm referring to this: http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf This is a very cool/useful feature to pattern

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-23 Thread Jerry Lam
uld be >> nice if SparkContext were friendlier to a restart just as a matter of >> design. AFAIK it is; not sure about SQLContext though. If it's not a >> priority it's just because this isn't a usual usage pattern, which >> doesn't mean it's crazy, just not the primary pattern. >

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-22 Thread Jerry Lam
and then start a second > context, it wasn't how Spark was originally designed, and I still see > gotchas. I'd avoid it. I don't think you should have to release some > resources; just keep the same context alive. > >> On Tue, Dec 22, 2015 at 5:13 AM, Jerry Lam <chiling...@gmai

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Jerry Lam
ote that when sc is stopped, all resources are released (for example in > yarn > On Dec 20, 2015, at 2:59 PM, Jerry Lam <chiling...@gmail.com> wrote: > > > Hi Spark developers, > > > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > corr

[Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-20 Thread Jerry Lam
Hi Spark developers, I found that SQLContext.getOrCreate(sc: SparkContext) does not behave correctly when a different spark context is provided. ``` val sc = new SparkContext val sqlContext =SQLContext.getOrCreate(sc) sc.stop ... val sc2 = new SparkContext val sqlContext2 =

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Jerry Lam
for the feature you mention? We have intentions to use Mesos but it is proven difficult with our tight budget constraint. Best Regards, Jerry > On Nov 23, 2015, at 2:41 PM, Andrew Or <and...@databricks.com> wrote: > > @Jerry Lam > > Can someone confirm if it is true th

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Jerry Lam
@Andrew Or I assume you are referring to this ticket [SPARK-5095]: https://issues.apache.org/jira/browse/SPARK-5095 <https://issues.apache.org/jira/browse/SPARK-5095> Thank you! Best Regards, Jerry > On Nov 23, 2015, at 2:41 PM, Andrew Or <and...@databricks.com> wrote:

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Jerry Lam
Hi guys, Can someone confirm if it is true that dynamic allocation on mesos "is designed to run one executor per slave with the configured amount of resources." I copied this sentence from the documentation. Does this mean there is at most 1 executor per node? Therefore, if you have a big

Re: Unchecked contribution (JIRA and PR)

2015-11-03 Thread Jerry Lam
Sergio, you are not alone for sure. Check the RowSimilarity implementation [SPARK-4823]. It has been there for 6 months. It is very likely those which don't merge in the version of spark that it was developed will never merged because spark changes quite significantly from version to version if

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread Jerry Lam
We "used" Spark on Mesos to build interactive data analysis platform because the interactive session could be long and might not use Spark for the entire session. It is very wasteful of resources if we used the coarse-grained mode because it keeps resource for the entire session. Therefore,

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-21 Thread Jerry Lam
t;r...@databricks.com> wrote: > With Jerry's permission, sending this back to the dev list to close the > loop. > > > -- Forwarded message ---------- > From: Jerry Lam <chiling...@gmail.com> > Date: Tue, Oct 20, 2015 at 3:54 PM > Subject: Re: If you use

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-21 Thread Jerry Lam
wrote: > Is this still Mesos fine grained mode? > > > On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi guys, >> >> There is another memory issue. Not sure if this is related to Tungsten >> this time because I have it disabl

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Jerry Lam
I disabled it because of the "Could not acquire 65536 bytes of memory". It happens to fail the job. So for now, I'm not touching it. On Tue, Oct 20, 2015 at 4:48 PM, charmee wrote: > We had disabled tungsten after we found few performance issues, but had to > enable it back

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Jerry Lam
20, 2015 at 5:27 PM, Reynold Xin <r...@databricks.com> wrote: > Jerry - I think that's been fixed in 1.5.1. Do you still see it? > > On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> I disabled it because of the "Could not acquire 65

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-28 Thread Jerry Lam
Hi Spark Developers, The Spark 1.5.1 documentation is already publicly accessible ( https://spark.apache.org/docs/latest/index.html) but the release is not. Is it intentional? Best Regards, Jerry On Mon, Sep 28, 2015 at 9:21 AM, james wrote: > +1 > > 1) Build binary

Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
Hi Spark Developers, I just ran some very simple operations on a dataset. I was surprise by the execution plan of take(1), head() or first(). For your reference, this is what I did in pyspark 1.5: df=sqlContext.read.parquet("someparquetfiles") df.head() The above lines take over 15 minutes. I

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
I just noticed you found 1.4 has the same issue. I added that as well in the ticket. On Mon, Sep 21, 2015 at 1:43 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Yin, > > You are right! I just tried the scala version with the above lines, it > works as expected. > I'm n

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
0:01 AM, Yin Huai <yh...@databricks.com> wrote: > >> Hi Jerry, >> >> Looks like it is a Python-specific issue. Can you create a JIRA? >> >> Thanks, >> >> Yin >> >> On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam <chiling...@gmail.com> wrote:

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread Jerry Lam
Hi Spark Developers, I'm eager to try it out! However, I got problems in resolving dependencies: [warn] [NOT FOUND ] org.apache.spark#spark-core_2.10;1.5.0!spark-core_2.10.jar (0ms) [warn] jcenter: tried When the package will be available? Best Regards, Jerry On Wed, Sep 9, 2015 at

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-17 Thread Jerry Lam
Hi Nick, I forgot to mention in the survey that ganglia is never installed properly for some reasons. I have this exception every time I launched the cluster: Starting httpd: httpd: Syntax error on line 154 of /etc/httpd/conf/httpd.conf: Cannot load /etc/httpd/modules/mod_authz_core.so into

Re: Spark Mesos Dispatcher

2015-07-19 Thread Jerry Lam
Yes. Sent from my iPhone On 19 Jul, 2015, at 10:52 pm, Jahagirdar, Madhu madhu.jahagir...@philips.com wrote: All, Can we run different version of Spark using the same Mesos Dispatcher. For example we can run drivers with Spark 1.3 and Spark 1.4 at the same time ? Regards, Madhu

Re: Spark Mesos Dispatcher

2015-07-19 Thread Jerry Lam
? -- *From:* Jerry Lam [chiling...@gmail.com] *Sent:* Monday, July 20, 2015 8:27 AM *To:* Jahagirdar, Madhu *Cc:* user; dev@spark.apache.org *Subject:* Re: Spark Mesos Dispatcher Yes. Sent from my iPhone On 19 Jul, 2015, at 10:52 pm, Jahagirdar, Madhu madhu.jahagir...@philips.com wrote

Re: [PySpark DataFrame] When a Row is not a Row

2015-07-11 Thread Jerry Lam
Hi guys, I just hit the same problem. It is very confusing when Row is not the same Row type at runtime. The worst thing is that when I use Spark in local mode, the Row is the same Row type! so it passes the test cases but it fails when I deploy the application. Can someone suggest a