Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
only me with this mental issue) Unless I'm mistaken, -Pmesos won't get included in 2.0.x releases unless someone adds it to branch-2.0. Correct? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
Hi Sean, So, another question would be when is the change going to be released then? What's the version for the master? The next release's 2.0.2 so it's not for mesos profile either :( Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
Hi Sean, Sure, but then the question is why it's not a part of 2.0.1? I thought it was considered ready for prime time and so should be shipped in 2.0.1. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
Hi, That's even more interesting. How's so since the profile got added a week ago or later and RC2 was cut two/three days ago? Anyone know? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-24 Thread Jacek Laskowski
Hi, I keep asking myself why are you guys not including -Pmesos in your builds? Is this on purpose or have you overlooked it? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Why Expression.deterministic method and Nondeterministic trait?

2016-09-24 Thread Jacek Laskowski
that there are places where you check whether an expression is deterministic by the method not the trait. If so, there's this ambiguity I'm talking about. Anyway, I'm glad to learn from you guys! Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-24 Thread Jacek Laskowski
On Sat, Sep 24, 2016 at 5:27 AM, Hyukjin Kwon wrote: > Then, are we going to submit a PR and fix this maybe? https://issues.apache.org/jira/browse/SPARK-17656 Thanks Hyukjin! Unless someone beats me to it, I'm going to have a PR over the weekend. Jacek

Why Expression.deterministic method and Nondeterministic trait?

2016-09-23 Thread Jacek Laskowski
] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L80 [3] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L271 Pozdrawiam, Jacek

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-23 Thread Jacek Laskowski
+1 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Sep 23, 2016 at 8:01 AM, Reynold Xin <r...@databricks.com> wrote: > Please vote on

Deserializing InternalRow using a case class - how to avoid creating attrs manually?

2016-09-22 Thread Jacek Laskowski
r and I'll remember longer! :-) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubs

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-17 Thread Jacek Laskowski
(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Sep 17, 2016 at 11:08 PM, Xiang Gao

Re: Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Jacek Laskowski
Hi Sean, Thanks a lot for help understanding the different jars. Do you think there's anything that should be reported as an enhancement/issue/task in JIRA? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark

Different versions of dependencies in assembly/target/scala-2.11/jars?

2016-09-17 Thread Jacek Laskowski
-runtime-4.5.3.jar Even if that does not cause any class mismatches, it might be worth to exclude them to minimize the size of the Spark distro. What do you think? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark

Why CatalogImpl.makeDataset and SparkSession.createDataset?

2016-09-12 Thread Jacek Laskowski
, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

@scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-08 Thread Jacek Laskowski
and @scala.annotation.varargs only. WDYT? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Jacek Laskowski
way of thinking? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Sep 8, 2016 at 11:20 AM, Steve Loughran <ste...@hortonworks.com> wrote: >

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Jacek Laskowski
On Thu, Sep 8, 2016 at 9:03 AM, Fred Reiss wrote: > I suppose the type-inference-time check for the presence of the input > directory could be moved to the FileStreamSource's initialization. But if > the directory isn't there when the source is being created, it probably >

FileStreamSource source checks path eagerly?

2016-09-08 Thread Jacek Laskowski
scala:153) ... 48 elided Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: df.groupBy('m).agg(sum('n)).show dies with 10^3 elements?

2016-09-06 Thread Jacek Laskowski
with to fix/reproduce the issue, let me know. I wish I knew how to write a unit test for this. Where in the code to look for inspiration? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

df.groupBy('m).agg(sum('n)).show dies with 10^3 elements?

2016-09-06 Thread Jacek Laskowski
lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0 ... Please see https://gist.github.com/jaceklaskowski/906d62b830f6c967a7eee5f8eb6e9237 and let me know if I should file an issue. I don't think 10^3 elements and groupBy should kill spark-shell. Pozdrawiam, Jacek Lask

Re: Reynold on vacation next two weeks

2016-08-30 Thread Jacek Laskowski
Hi, Definitely well deserved. Don't check your emails for the 2 weeks. Not even for a minute :-) Jacek On 30 Aug 2016 10:21 a.m., "Reynold Xin" wrote: > A lot of people have been pinging me on github and email directly and > expect instant reply. Just FYI I'm on vacation

Re: 3Ps for Datasets not available?! (=Parquet Predicate Pushdown)

2016-08-30 Thread Jacek Laskowski
? I'm tempted to say that for some data sources DataFrames are faster than Datasets...always. True? What am I missing? https://twitter.com/jaceklaskowski/status/770554918419755008 Thanks a lot, Reynold, for helping me out to get the gist of it all! Pozdrawiam, Jacek Laskowski https://medium.com

3Ps for Datasets not available?! (=Parquet Predicate Pushdown)

2016-08-30 Thread Jacek Laskowski
eScan parquet [id#196L,name#197] Batched: true, Format: ParquetFormat, InputPaths: file:/Users/jacek/dev/oss/spark/cities.parquet, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint,name:string> Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering A

Re: Mesos is now a maven module

2016-08-26 Thread Jacek Laskowski
Hi Michael, Congrats! BTW What I like about the change the most is that it uses the pluggable interface for TaskScheduler and SchedulerBackend (as introduced by YARN). Think Standalone should follow the steps. WDYT? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

Re: Spark dev-setup

2016-08-24 Thread Jacek Laskowski
On Wed, Aug 24, 2016 at 2:32 PM, Steve Loughran wrote: > no reason; the key thing is : not in cluster mode, as there your work happens > elsewhere Right! Anything but cluster mode should make it easy (that leaves us with local). Jacek

Re: Spark dev-setup

2016-08-24 Thread Jacek Laskowski
On Wed, Aug 24, 2016 at 11:13 AM, Steve Loughran wrote: > I'd recommend ...which I mostly agree to with some exceptions :) > -stark spark standalone from there Why spark standalone since the OP asked about "learning how query execution flow occurs in Spark SQL"? How

Analyzer.resolver a duplicate of CatalystConf.resolver?

2016-08-22 Thread Jacek Laskowski
fixes). [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L67-L73 [2] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala#L43-L45 Pozdrawiam, Jacek

Why is isStreaming naming-inconsistent with analyzed and resolved in LogicalPlan?

2016-08-22 Thread Jacek Laskowski
ke to hear the real reason). Thanks. [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L46 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-ap

Found a typo in Catalyst's exception and want to write a test -- help needed

2016-08-18 Thread Jacek Laskowski
class that I could extend to assert the exception's message? How to run the test? Is sbt catalyst/testOnly [testName] enough? Please guide. Thanks. [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala#L249 Pozdrawiam, Ja

How is mapped LogicalPlan to RDDs eventually if ever? How about Dataset?

2016-08-17 Thread Jacek Laskowski
! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [master] ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

2016-08-17 Thread Jacek Laskowski
On Tue, Aug 16, 2016 at 10:51 PM, Yin Huai wrote: > Do you want to try it? Yes, indeed! I'd be more than happy. Guide me if you don't mind. Thanks. Should I create a JIRA for this? Jacek - To

[master] ERROR RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)

2016-08-16 Thread Jacek Laskowski
891) ... res1: org.apache.spark.sql.Dataset[A] = [id: int] scala> spark.version res2: String = 2.1.0-SNAPSHOT See the complete stack trace at https://gist.github.com/jaceklaskowski/a969fdd5c2c9cdb736bf647b01257a3e. I'm quite positive that it didn't happen a day or two ago. Pozdrawiam,

Re: GraphFrames 0.2.0 released

2016-08-16 Thread Jacek Laskowski
Hi Tim, AWESOME. Thanks a lot for releasing it. That makes me even more eager to see it in Spark's codebase (and replacing the current RDD-based API)! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me

Re: Spark 2.0.1 / 2.1.0 on Maven

2016-08-15 Thread Jacek Laskowski
Thanks Sean. That reflects my sentiments so well! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Aug 15, 2016 at 1:08 AM, Sean Owen <so...@cloudera.

Re: Spark 2.0.1 / 2.1.0 on Maven

2016-08-14 Thread Jacek Laskowski
s and PMCs should do not users: "Do not include any links on the project website that might encourage non-developers to download and use nightly builds, snapshots, release candidates, or any other similar package." Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Master

Re: Spark SQL and Kryo registration

2016-08-04 Thread Jacek Laskowski
Hi Olivier, I don't know either, but am curious what you've tried already. Jacek On 3 Aug 2016 10:50 a.m., "Olivier Girardot" < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I'm currently to use Spark 2.0.0 and making Dataframes work with kryo. > registrationRequired=true > Is it

[YARN] YarnAllocator.updateResourceRequests -- could be simpler and faster, too?

2016-07-31 Thread Jacek Laskowski
la#L355 [3] https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L360 [4] https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L200 Pozdrawiam, Jacek Laskowski https://medium

[YARN] Question about ApplicationMaster's shutdown hook (priority)

2016-07-30 Thread Jacek Laskowski
om/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L205 [4] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L146-L147 Pozdrawiam, Jacek Laskowski https://medium.com/@jacek

Re: Internal Deprecation warnings - worth fixing?

2016-07-27 Thread Jacek Laskowski
Kill 'em all -- one by one slowly yet gradually! :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jul 27, 2016 at 9:11 PM, Holden Karau <

Renaming spark.driver.appUIAddress to spark.yarn.driver.appUIAddress?

2016-07-26 Thread Jacek Laskowski
hts before filling an JIRA issue. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e

Re: Build error

2016-07-22 Thread Jacek Laskowski
Hi, Fixed now. git pull and start over. https://github.com/apache/spark/commit/e1bd70f44b11141b000821e9754efeabc14f24a5 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

BUILD broken - one hotfix ready for merging

2016-07-21 Thread Jacek Laskowski
Hi, It seems that the current master is broken twice. I've just sent a PR for the first one. Please review and merge. https://github.com/apache/spark/pull/14315 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark

Re: spark-packages with maven

2016-07-15 Thread Jacek Laskowski
+1000 Thanks Ismael for bringing this up! I meant to have send it earlier too since I've been struggling with a sbt-based Scala project for a Spark package myself this week and haven't yet found out how to do local publishing. If such a guide existed for Maven I could use it for sbt easily too

Re: Why isnt spark-yarn module is excluded from the spark parent pom?

2016-07-13 Thread Jacek Laskowski
And the reason is that not all Spark installations are for YARN as the cluster manager. Jacek On 13 Jul 2016 9:23 a.m., "Sean Owen" wrote: > It's activated by a profile called 'yarn', like several other modules. > > On Wed, Jul 13, 2016 at 5:15 AM, Niranda Perera >

Re: Stopping Spark executors

2016-07-08 Thread Jacek Laskowski
Hi, Read the doc http://spark.apache.org/docs/latest/spark-standalone.html which seems to be the cluster manager the OP uses. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Stopping Spark executors

2016-07-07 Thread Jacek Laskowski
Hi, It appears you're running local mode (local[*] assumed) so killing spark-shell *will* kill the one and only executor -- the driver :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https

Re: Stopping Spark executors

2016-07-07 Thread Jacek Laskowski
Hi, Use jps -lm and see the processes on the machine(s) to kill. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jul 6, 2016 at 9:49 PM, Mr rty ff <y

Re: Why's ds.foreachPartition(println) not possible?

2016-07-06 Thread Jacek Laskowski
Thanks Cody, Reynold, and Ryan! Learnt a lot and feel "corrected". Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jul 6, 2016 at 2:46 AM, Shixiong

Re: spark git commit: [SPARK-15204][SQL] improve nullability inference for Aggregator

2016-07-05 Thread Jacek Laskowski
On Mon, Jul 4, 2016 at 6:14 AM, wrote: > Repository: spark > Updated Branches: > refs/heads/master 88134e736 -> 8cdb81fa8 > > > [SPARK-15204][SQL] improve nullability inference for Aggregator > > ## What changes were proposed in this pull request? > >

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Hi Reynold, Is this already reported and tracked somewhere. I'm quite sure that people will be asking about the reasons Spark does this. Where are such issues reported usually? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
Scala I want to use Scala API right). It appears that any single-argument-function operators in Datasets are affected :( My question was to know whether there are works to fix it (if possible -- I don't know if it is). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

Re: Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
ds is Dataset and the problem is that println (or any other one-element function) would not work here (and perhaps other methods with two variants - Java's and Scala's). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache

Why's ds.foreachPartition(println) not possible?

2016-07-05 Thread Jacek Laskowski
unc: org.apache.spark.api.java.function.ForeachPartitionFunction[Record])Unit (f: Iterator[Record] => Unit)Unit cannot be applied to (Unit) ds.foreachPartition(println) ^ scala> sc.version res9: String = 2.0.0-SNAPSHOT Pozdrawiam, Jacek Laskowski https://medium.com/@jacek

Re: [jira] [Resolved] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-03 Thread Jacek Laskowski
On Sun, Jul 3, 2016 at 3:49 PM, Sean Owen <so...@cloudera.com> wrote: > On Sun, Jul 3, 2016 at 2:42 PM, Jacek Laskowski <ja...@japila.pl> wrote: >> 2. Add new features to master (versions - master: 2.0.0-SNAPSHOT >> branch: 2.0.0-RC1) > > Either: > a) you proh

Re: [jira] [Resolved] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-03 Thread Jacek Laskowski
g? I must be missing something, but can't see it. You're right, it has nothing to do with pace of release but the project needs frequent releases say quarterly. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me

Re: [jira] [Resolved] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-03 Thread Jacek Laskowski
k On 3 Jul 2016 2:59 a.m., "Reynold Xin" <r...@databricks.com> wrote: > Because in that case you cannot merge anything meant for 2.1 until 2.0 is > released. > > On Saturday, July 2, 2016, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >&

Re: [jira] [Resolved] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-02 Thread Jacek Laskowski
Hi, Always release from master. What could be the gotchas? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Jul 2, 2016 at 11:36 PM, Sean Owen <

Re: [jira] [Resolved] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-02 Thread Jacek Laskowski
Hi, Thanks Sean! It makes sense. I'm not fully convinced that's how it should be, so I apologize if I ever ask about the version management in Spark again :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow

Re: What's the meaning of Target Version/s in Spark's JIRA?

2016-06-28 Thread Jacek Laskowski
Hi, That makes sense. Thanks Dongjoon for the very prompt response! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jun 28, 2016 at 6:58 PM, Dongjoon Hyun

What's the meaning of Target Version/s in Spark's JIRA?

2016-06-28 Thread Jacek Laskowski
Hi, While reviewing the release notes for 1.6.2 I stumbled upon https://issues.apache.org/jira/browse/SPARK-13522. It's got Target Version/s: 2.0.0 with Fix Version/s: 1.6.2, 2.0.0. What's the meaning of Target Version/s in Spark? Pozdrawiam, Jacek Laskowski https://medium.com

Re: Using SHUFFLE_SERVICE_ENABLED for MesosCoarseGrainedSchedulerBackend, BlockManager, and Utils?

2016-06-27 Thread Jacek Laskowski
Thanks Sean. I'm going to create a JIRA for it and start the work under it. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Jun 27, 2016 at 9:19 AM, Sean Owen

Using SHUFFLE_SERVICE_ENABLED for MesosCoarseGrainedSchedulerBackend, BlockManager, and Utils?

2016-06-26 Thread Jacek Laskowski
/mesos/MesosCoarseGrainedSchedulerBackend.scala#L71 [3] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L73-L74 [4] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L748 Pozdrawiam, Jacek

Does CoarseGrainedSchedulerBackend care about cores only? And disregards memory?

2016-06-23 Thread Jacek Laskowski
Spark application for execution both -- memory and cores -- can be specified explicitly. Would you agree? Do I miss anything important? I was very surprised when I found it out as I thought that memory would also have been a limiting factor. Pozdrawiam, Jacek Laskowski https://medium.com

How to explain SchedulerBackend.reviveOffers()?

2016-06-20 Thread Jacek Laskowski
`backend.reviveOffers()`? p.s. I understand that it's somehow related to how Mesos manages resources where it offers resources, but can't find anything related to `reviving offers` in Mesos docs :( Please guide. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-18 Thread Jacek Laskowski
+1 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Jun 18, 2016 at 9:13 AM, Reynold Xin <r...@databricks.com> wrote: > Looks like that's resolved n

Re: Spark 2.0 Dataset Documentation

2016-06-18 Thread Jacek Laskowski
On Sat, Jun 18, 2016 at 6:13 AM, Pedro Rodriguez wrote: > using Datasets (eg using $ to select columns). Or even my favourite one - the tick ` :-) Jacek - To unsubscribe, e-mail:

Re: cutting 1.6.2 rc and 2.0.0 rc this week?

2016-06-16 Thread Jacek Laskowski
That's be awesome to have another 2.0 RC! I know many people who'd consider it as a call to action to play with 2.0. +1000 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: [YARN] Small fix for yarn.Client to use buildPath (not Path.SEPARATOR)

2016-06-14 Thread Jacek Laskowski
. And only then I could work on https://issues.apache.org/jira/browse/YARN-5247. Is this about changing the annotation(s) only? Thanks for your support! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https

[YARN] Small fix for yarn.Client to use buildPath (not Path.SEPARATOR)

2016-06-13 Thread Jacek Laskowski
] https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1298 [2] Path.SEPARATOR Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Welcoming Yanbo Liang as a committer

2016-06-04 Thread Jacek Laskowski
Hi, Congrats Yanbo! p.s. It should go to user@, too. Jacek On 4 Jun 2016 4:49 a.m., "Matei Zaharia" wrote: Hi all, The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a super active contributor in many areas of MLlib. Please join me in welcoming

SPARK_YARN_MODE, yarn-client master URL and SparkILoop

2016-06-02 Thread Jacek Laskowski
nt (and yarn-cluster) are no longer in use, I'm pretty sure it's of no use and could be safely removed. If not, we should do something with it anyway. Please guide before I file a JIRA issue. Thanks. p.s. On to hunting SPARK_YARN_MODE... Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskows

Re: How to access the off-heap representation of cached data in Spark 2.0

2016-05-29 Thread Jacek Laskowski
On Sun, May 29, 2016 at 5:30 PM, jpivar...@gmail.com wrote: > If I find a way to provide > access by modifying Spark source, can I just submit a pull request, or do I > need to be a recognized Spark developer? If so, is there a process for > becoming one? Start a discussion

Re: How to access the off-heap representation of cached data in Spark 2.0

2016-05-28 Thread Jacek Laskowski
my limited understanding of the things (and I'm not even sure how trustworthy it is). Use with extreme caution. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, May 28, 20

LiveListenerBus with started and stopped flags? Why both?

2016-05-25 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

TaskSchedulerImpl#initialize - why is rootPool initialized here not while TaskSchedulerImpl is created?

2016-05-06 Thread Jacek Laskowski
? [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L131-L142 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: spark git commit: [HOTFIX] Fix the problem for real this time.

2016-04-25 Thread Jacek Laskowski
] Thanks Reynold! :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Apr 26, 2016 at 6:38 AM

Re: spark git commit: [HOTFIX] Fix compilation

2016-04-25 Thread Jacek Laskowski
]^ [error] one error found [error] Compile failed at Apr 26, 2016 6:28:01 AM [0.449s] Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Apr 26, 2016 at 6

Re: [spark.ml] Why is private class ColumnPruner?

2016-04-19 Thread Jacek Laskowski
Hi Yanbo, https://issues.apache.org/jira/browse/SPARK-14730 Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Apr 19, 2016 at 8:55 AM, Yanbo Liang

More elaborate toString for StreamExecution?

2016-04-18 Thread Jacek Laskowski
ing it could have. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsub

Re: Implicit from ProcessingTime to scala.concurrent.duration.Duration?

2016-04-18 Thread Jacek Laskowski
When you say "in the future", do you have any specific timeframe in mind? You got me curious :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Ap

Implicit from ProcessingTime to scala.concurrent.duration.Duration?

2016-04-18 Thread Jacek Laskowski
it's not a release feature I didn't mean to file an issue in JIRA - please guide if needed). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Dataset.explain, ExplainCommand and sqlContext.executePlan twice?

2016-04-13 Thread Jacek Laskowski
? It appears that we calls the former to execute the latter (?) I'm confused. Please explain :) I'd appreciate. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: [BUILD FAILURE] Spark Project ML Local Library - me or it's real?

2016-04-09 Thread Jacek Laskowski
/961M [INFO] Thank you so much for the prompt solution! And that's while I was driving from Toronto to Mississauga. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http

[BUILD FAILURE] Spark Project ML Local Library - me or it's real?

2016-04-09 Thread Jacek Laskowski
ark-mllib-local_2.11 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsub

Re: BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Jacek Laskowski
OS X ➜ spark git:(master) ✗ java -version java version "1.8.0_77" Java(TM) SE Runtime Environment (build 1.8.0_77-b03) Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly

BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Jacek Laskowski
https://github.com/apache/spark/commit/c59abad052b7beec4ef550049413e95578e545be. Is this a real issue with the build now or is this just me? I may have seen a similar case before, but can't remember what the fix was. Looking into it. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http:/

Re: [STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?!

2016-04-05 Thread Jacek Laskowski
Hi Ted, Yeah, I saw the line, but forgot it's a test that may have been testing that closures should not have return. More clear now. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https

[STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?!

2016-04-05 Thread Jacek Laskowski
it. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

error: reference to sql is ambiguous after import org.apache.spark._ in shell?

2016-04-04 Thread Jacek Laskowski
implicits._ (52 terms, 31 are implicit) 2) import sqlContext.sql (1 terms) scala> sc.version res19: String = 2.0.0-SNAPSHOT Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https

Re: explain codegen

2016-04-03 Thread Jacek Laskowski
Hi, Looks related to the recent commit... Repository: spark Updated Branches: refs/heads/master 2262a9335 -> 1f0c5dceb [SPARK-14350][SQL] EXPLAIN output should be in a single cell Jacek 03.04.2016 7:00 PM "Ted Yu" napisał(a): > Hi, > Based on master branch refreshed

[SQL] Dataset.map gives error: missing parameter type for expanded function?

2016-04-03 Thread Jacek Laskowski
g? Please guide. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsubscr...@s

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-28 Thread Jacek Laskowski
if at least some > traits from org.apache.spark.ml.param.shared.sharedParams were > public?HasInputCol(s) and HasOutputCol for example. These are useful > pretty much every time you create custom Transformer. > > -- > Pozdrawiam, > Maciej Szymkiewicz > > > On 03

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Jacek Laskowski
(and this discussion is a sign that the process has not been conducted properly as people have concerns, me including). Thanks Mridul! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-26 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Mar 26, 2016 at 3:23 AM, Joseph Bradley <jos...@databricks.com> wrote: > There have been some comments ab

[spark.ml] Why is private class ColumnPruner?

2016-03-25 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-mail: dev-unsubscr...@spark.

Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-25 Thread Jacek Laskowski
Hi, After few weeks with spark.ml now, I came to conclusion that Transformer concept from Pipeline API (spark.ml/MLlib) should be part of DataFrame (SQL) where they fit better. Are there any plans to migrate Transformer API (ML) to DataFrame (SQL)? Pozdrawiam, Jacek Laskowski https

[ml] Two ClassificationModels are final and two are not - why?

2016-03-24 Thread Jacek Laskowski
l` (`final`) ** `LogisticRegressionModel` ** `NaiveBayesModel` Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscri

Re: Spark structured streaming

2016-03-08 Thread Jacek Laskowski
Hi Praveen, I don't really know. I think TD or Michael should know as they personally involved in the task (as far as I could figure it out from the JIRA and the changes). Ping people on the JIRA so they notice your question(s). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Spark structured streaming

2016-03-08 Thread Jacek Laskowski
Hi Praveen, I've spent few hours on the changes related to streaming dataframes (included in the SPARK-8360) and concluded that it's currently only possible to read.stream(), but not write.stream() since there are no streaming Sinks yet. Pozdrawiam, Jacek Laskowski https://medium.com

Re: BUILD FAILURE due to...Unable to find configuration file at location dev/scalastyle-config.xml

2016-03-08 Thread Jacek Laskowski
Hi, At first glance it appears the commit *yesterday* (Warsaw time) broke the build :( https://github.com/apache/spark/commit/0eea12a3d956b54bbbd73d21b296868852a04494 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache

<    1   2   3   4   >