Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
Hi Sean, Sure, but then the question is why it's not a part of 2.0.1? I thought it was considered ready for prime time and so should be shipped in 2.0.1. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Foll

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
Hi Sean, So, another question would be when is the change going to be released then? What's the version for the master? The next release's 2.0.2 so it's not for mesos profile either :( Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-25 Thread Jacek Laskowski
eally hope it's only me with this mental issue) Unless I'm mistaken, -Pmesos won't get included in 2.0.x releases unless someone adds it to branch-2.0. Correct? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-a

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-25 Thread Jacek Laskowski
+1 Ship it! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Sep 25, 2016 at 12:08 AM, Reynold Xin wrote: > Please vote on releasing the follow

Should LeafExpression have children final override (like Nondeterministic)?

2016-09-27 Thread Jacek Laskowski
ng is that LeafExpression is to mark left expressions so children is assumed to be Nil. Should children be final in LeafExpression? Why not? #curious Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow m

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-29 Thread Jacek Laskowski
code does not get compiled unless you enable the profile explicitly. I've learnt it's not part of the release, though. Thanks for all the clarifications! I appreciate your patience dealing with my questions a lot! Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/

Dynamic allocation / killing executors work? Perhaps it's just web UI?

2016-09-29 Thread Jacek Laskowski
rs as ACTIVE in Status column in Executors tab. Can anyone confirm that dynamic allocation works fine and web UI shows the current status of executors? What other information do you want me to offer to verify it. I'm doubtful that web UI shows what it's supposed to show regarding exec

DAGScheduler.handleJobCancellation uses jobIdToStageIds for verification while jobIdToActiveJob for lookup?

2016-10-13 Thread Jacek Laskowski
/spark/scheduler/DAGScheduler.scala#L1372 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1376 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow

Re: DAGScheduler.handleJobCancellation uses jobIdToStageIds for verification while jobIdToActiveJob for lookup?

2016-10-13 Thread Jacek Laskowski
Thanks Imran! Not only did the response come so promptly, but also it's something I could work on (and have another Spark contributor badge unlocked)! Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-

Redundant method in SparkUI and entire SparkUITab?

2016-10-23 Thread Jacek Laskowski
your comments to learn Spark better. Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski ---

[info] Warning: Unknown ScalaCheck args provided: -oDF

2016-10-29 Thread Jacek Laskowski
Hi, Just noticed the messages from the recent build of my pull request in Jenkins: [info] Warning: Unknown ScalaCheck args provided: -oDF I think we should fix it, right? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering

withExpr private method duplication in Column and functions objects?

2016-11-11 Thread Jacek Laskowski
/org/apache/spark/sql/Column.scala#L152 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L60 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at

ShuffleExchange#nodeName...a duplication...perhaps?!

2016-11-12 Thread Jacek Laskowski
cation perhaps? (Makes reading the code slightly more involved). [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala#L46-L53 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.

On the use of catalyst.dsl package and deserialize vs CatalystSerde.deserialize

2016-11-13 Thread Jacek Laskowski
talyst/plans/logical/object.scala#L32 [4] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2498 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at

Re: how does isDistinct work on expressions

2016-11-13 Thread Jacek Laskowski
creating a new UDAF? What have you done already? GitHub perhaps? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Nov 13, 2016 at 12:03 PM, assaf.mendels

Re: Component naming in the PR title

2016-11-13 Thread Jacek Laskowski
t in use (or be acceptable). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Nov 12, 2016 at 6:27 PM, Hyukjin Kwon wrote: > Hi all, > > > First of all

Analyzing and reusing cached Datasets

2016-11-19 Thread Jacek Laskowski
roject [id#26L, id#26L AS new#29L] +- Range (0, 1, step=1, splits=Some(8)) == Physical Plan == *Project [id#26L, id#26L AS new#29L] +- *Range (0, 1, step=1, splits=Some(8)) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.l

Re: Analyzing and reusing cached Datasets

2016-11-20 Thread Jacek Laskowski
m doing it anyway to hunt down the "issue")? 2. Defining an override for sameResult in Range (as LocalRelation and other logical operators)? Somehow I feel Spark could do better. Please guide (and help me get better at this low-level infra of Spark SQL). Thanks! Pozdrawiam, Jacek Las

Use of BroadcastFactory interface (after SPARK-12588 Remove HTTPBroadcast)

2016-11-27 Thread Jacek Laskowski
tom BroadcastFactory (and hence Broadcast) in. WDYT? [1] https://issues.apache.org/jira/browse/SPARK-12588 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/BroadcastFactory.scala#L25-L30 Pozdrawiam, Jacek Laskowski https://medium.com/@jacek

Re: Kafka Spark structured streaming latency benchmark.

2016-12-20 Thread Jacek Laskowski
Hi, (what a timing. Just reviewed CC yesterday!) In ALS they trigger cleaning up shufflemapstages themselves so if I understood the issue the streaming part could do it too. Jacek On 19 Dec 2016 11:35 p.m., "Shixiong(Ryan) Zhu" wrote: > Hey Prashant. Thanks for your codes. I did some investig

MapOutputTracker.getMapSizesByExecutorId and mutation on the driver?

2016-12-23 Thread Jacek Laskowski
apache/spark/MapOutputTracker.scala#L133 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski -

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Hi Michael, That caught my attention... Could you please elaborate on "elastically grow and shrink CPU usage" and how it really works under the covers? It seems that CPU usage is just a "label" for an executor on Mesos. Where's this in the code? Pozdrawiam, J

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Thanks a LOT, Michael! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Dec 26, 2016 at 10:04 PM, Michael Gummelt wrote: > In fine-grained mode (which

Why ShuffleManager.registerShuffle takes shuffleId since ShuffleDependency has it too?

2016-12-28 Thread Jacek Laskowski
/core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala#L35 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklask

Why is spark.shuffle.sort.bypassMergeThreshold 200?

2016-12-28 Thread Jacek Laskowski
ld? scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions res3: Int = 200 I'd appreciate any guidance to get the gist of this seemingly magic number. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mast

Re: [ANNOUNCE] Announcing Apache Spark 2.1.0

2016-12-29 Thread Jacek Laskowski
Hi Yan, I've been surprised the first time when I noticed rxin stepped back and a new release manager stepped in. Congrats on your first ANNOUNCE! I can only expect even more great stuff coming in to Spark from the dev team after Reynold spared some time 😉 Can't wait to read the changes... Jace

Why ShuffleMapTask has transient locs and preferredLocs?!

2017-01-03 Thread Jacek Laskowski
(and BlockManagerMaster on the driver) to track the shuffle locations (MapStatuses)? Is my understanding correct? What am I missing? (I'm exploring shuffle system currently and would appreciate comments a lot!) Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mast

Re: What is mainly different from a UDT and a spark internal type that ExpressionEncoder recognized?

2017-01-03 Thread Jacek Laskowski
e part in codegen. Thanks for sharing your notes! Gonna merge yours with mine! Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Jan 2, 2017 at 6:30 PM,

Re: What is mainly different from a UDT and a spark internal type that ExpressionEncoder recognized?

2017-01-03 Thread Jacek Laskowski
Thanks Herman for the explanation. I silently assume that the other points were ok since you did not object? Correct? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Why ShuffleMapTask has transient locs and preferredLocs?!

2017-01-04 Thread Jacek Laskowski
tion (!) Thanks a lot. On to digging deeper... Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jan 3, 2017 at 10:08 PM, Imran Rashid wrote: > Hi Jacek, &

Re: Quick request: prolific PR openers, review your open PRs

2017-01-08 Thread Jacek Laskowski
+1 What an excellent way to offload some of your chores! I'm so much to learn from you, Sean! (Now since Sean seems to have a bit more time I'm gonna send few PRs hoping he spares some time to find merits in them :)) Pozdrawiam, Jacek Laskowski https://medium.com/@jace

protected val mapStatuses is ConcurrentHashMap in both MapOutputTrackerMaster and MapOutputTrackerWorker?

2017-01-08 Thread Jacek Laskowski
github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L84 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jace

scala.MatchError: scala.collection.immutable.Range.Inclusive from catalyst.ScalaReflection.serializerFor?

2017-01-09 Thread Jacek Laskowski
e information. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail:

What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
ore/src/main/scala/org/apache/spark/TaskContext.scala#L41 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L50 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
nless we go through a > deprecation process for it). > > Regards, > Mridul > > > On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski wrote: > > Hi, > > > > Just noticed that TaskContext#getPartitionId [1] is not used and > > moreover the scaladoc is incorrec

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
Hi Sean, Can you elaborate on " it's actually used by Spark"? Where exactly? I'd like to be corrected. What about the scaladoc? Since the method's a public API, I think it should be fixed, shouldn't it? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklas

RpcEnv(Factory) is no longer pluggable? spark.rpc is gone, isn't it?

2017-01-18 Thread Jacek Laskowski
rg/apache/spark/SparkConf.scala#L641 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala#L32 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow m

Re: RpcEnv(Factory) is no longer pluggable? spark.rpc is gone, isn't it?

2017-01-18 Thread Jacek Laskowski
On Wed, Jan 18, 2017 at 8:57 AM, Jacek Laskowski wrote: > p.s. How to know when the deprecation was introduced? The last change > is for executor blacklisting so git blame does not show what I want :( > Any ideas? Figured that out myself! $ git log --topo-order --graph -u -L 641,641

clientMode in RpcEnv.create in Spark on YARN vs general case (driver vs executors)?

2017-01-18 Thread Jacek Laskowski
ce-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L434 [3] https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L254 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apac

[YARN] $ and $$ in prepareCommand to resolve environment in ExecutorRunnable?

2017-01-24 Thread Jacek Laskowski
arn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L210 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Jacek Laskowski
Wow! At long last. Congrats Burak and Holden! p.s. I was a bit worried that the process of accepting new committers is equally hard as passing Sean's sanity checks for PRs, but given this it's so much easier it seems :D Pozdrawiam, Jacek Laskowski https://medium.com/@jace

Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Jacek Laskowski
spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L211 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L229 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0

Re: Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Jacek Laskowski
the other hand, since no one has considered it a small duplication it could be perfectly fine (it did make the code a bit less obvious to me). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://t

Re: Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Jacek Laskowski
Hi Imran, Ok, that makes sense for performance reasons. Thanks for bearing with me and explaining that code with so much patience. Appreciated! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https

Re: Typo on spark.apache.org? "cyclic data flow"

2017-01-28 Thread Jacek Laskowski
Hi Nicholas, Interesting. Just on the past Monday I was introducing spark and ran into it but thought it's my poor English skills :-) Thanks for spotting it! (I also think that the entire welcome page begs for a face lifting - it's from pre-2.0 days) Jacek On 28 Jan 2017 8:18 p.m., "Nicholas Ch

Fwd: Google Summer of Code 2017 is coming

2017-02-03 Thread Jacek Laskowski
Hi, Is this something Spark considering? Would be nice to mark issues as GSoC in JIRA and solicit feedback. What do you think? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Remove support for Hadoop 2.5 and earlier?

2017-02-03 Thread Jacek Laskowski
Hi Sean, Given that 3.0.0 is coming, removing the unused versions would be a huge benefit from maintenance point of view. I'd support removing support for 2.5 and earlier. Speaking of Hadoop support, is anyone considering 3.0.0 support? Can't find any JIRA for this. Pozdrawiam, Jacek

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-02-03 Thread Jacek Laskowski
understanding of `spark.memory.offHeap.enabled` is `false` is that it does not disable off heap memory used in Java NIO for buffers in shuffling, RPC, etc. so the memory is always (?) more than you request for mx using executor-memory. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Google Summer of Code 2017 is coming

2017-02-03 Thread Jacek Laskowski
Thanks Sean. You've again been very helpful to put the right tone to the matters. I stand corrected and have no interest in GSoC anymore. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at

Dynamic Allocation in Core vs YARN -- getDynamicAllocationInitialExecutors vs getInitialTargetExecutorNumber

2017-02-09 Thread Jacek Laskowski
he/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L270 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2516 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at

Re: Should we consider a Spark 2.1.1 release?

2017-03-19 Thread Jacek Laskowski
+1 More smaller and more frequent releases (so major releases get even more quality). Jacek On 13 Mar 2017 8:07 p.m., "Holden Karau" wrote: > Hi Spark Devs, > > Spark 2.1 has been out since end of December >

Re: Should we consider a Spark 2.1.1 release?

2017-03-19 Thread Jacek Laskowski
more eyeballs the less the number of the mistakes. If we make very fine/minor releases often we should be able to attract more people who spend their time on testing/verification that eventually contribute to a higher quality of Spark. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklask

[SQL] Registering custom Rule[LogicalPlan] using extendedResolutionRules by overriding SparkSession, SessionState, and Analyzer only?

2017-03-23 Thread Jacek Laskowski
//github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L107 [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala Pozdrawiam, Jacek Laskowski https://medium.com/@

Catalyst: unary or binary expressions that are not UnaryExpressions or BinaryExpressions? Why?

2017-03-29 Thread Jacek Laskowski
our comments. Thanks! p.s. Just a side note, since Unevaluated is an Expression why not extend from Unevaluated directly? I can understand why "extends Expression with Unevaluable" could be very valuable, but wish I hear what was the main motivation behind it. Thanks doubled! Pozdrawi

Why separate SessionStateBuilder? (it's BaseSessionStateBuilder)

2017-04-23 Thread Jacek Laskowski
ilder.scala#L54 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mai

GROUPING SETS as Dataset operator? Ordinals support?

2017-04-25 Thread Jacek Laskowski
rt for ordinals in groupBy and orderBy, but doesn't seem supported in GROUPING SETS. What do you think about adding the features to Spark SQL? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at

[KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Jacek Laskowski
/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering A

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Jacek Laskowski
t; >> I'm confused about what you're suggesting. Are you saying that a >> Kafka sink should take a filesystem path as an option? >> >> On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski wrote: >> > Hi, >> > >> > I've just found out that Ka

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-04 Thread Jacek Laskowski
https://issues.apache.org/jira/browse/SPARK-20597 I'm going to send a PR soon. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, May 1, 2017 at 8:

spark.sql.codegen.comments not in SQLConf?

2017-05-10 Thread Jacek Laskowski
ks. [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L822 [2] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Pozdrawiam,

Which one preferred -- Dataset.ofRows vs SparkSession.baseRelationToDataFrame?

2017-05-15 Thread Jacek Laskowski
thPlan that looks so similar to the others [3] [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2940-L2942 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spar

New metrics for WindowExec with number of partitions and frames?

2017-05-26 Thread Jacek Laskowski
Hi, Currently WindowExec gives no metrics in the web UI's Details for Query page. What do you think about adding the number of partitions and frames? That could certainly be super useful, but am unsure if that's the kind of metrics Spark SQL shows in the details. Pozdrawiam, Jacek

Why does Spark SQL use custom spark.sql.execution.id local property not SparkContext.setJobGroup?

2017-06-21 Thread Jacek Laskowski
a/org/apache/spark/sql/execution/SQLExecution.scala#L63 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L265 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://b

[SS] Why does ConsoleSink's addBatch convert input DataFrame to show it?

2017-07-07 Thread Jacek Laskowski
l/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L51-L53 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

2.2.0 under Unreleased Versions in JIRA?

2017-07-16 Thread Jacek Laskowski
Hi, Just noticed that 2.2.0 label is under Unreleased Versions in JIRA. Since it's out, I think 2.2.1 and 2.3.0 are valid only. Correct? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at

Re: 2.2.0 under Unreleased Versions in JIRA?

2017-07-16 Thread Jacek Laskowski
Confirmed. Thanks a lot, Sean. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Jul 16, 2017 at 3:02 PM, Sean Owen wrote: > Done, it just needed to

Fwd: spark git commit: [SPARK-21472][SQL] Introduce ArrowColumnVector as a reader for Arrow vectors.

2017-07-21 Thread Jacek Laskowski
. SUCCESS [01:41 min] [INFO] Spark Project SQL .. FAILURE [02:14 min] Is this only me or others suffer from it too? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-08 Thread Jacek Laskowski
Hi, Congrats!! Looks like Sean is gonna be less busy these days ;-) Jacek On 7 Aug 2017 5:53 p.m., "Matei Zaharia" wrote: > Hi everyone, > > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as > committers. Join me in congratulating both of them and thanking them for > their

[SS] watermark, eventTime and "StreamExecution: Streaming query made progress"

2017-08-11 Thread Jacek Laskowski
"walCommit" : 22 }, "eventTime" : { "avg" : "2017-08-11T07:04:23.782Z", "max" : "2017-08-11T07:04:28.282Z", "min" : "2017-08-11T07:04:19.282Z", "watermark" : "2017-08-11T07:04:08.282Z"

[SS] Collapsing EventTimeWatermark logical operators?

2017-08-12 Thread Jacek Laskowski
istingRDD[timestamp#773,value#774L] Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscri

Fwd: [jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-30 Thread Jacek Laskowski
picked up (but it was at least 2 days ago) :( I'm using the master at https://github.com/apache/spark/commit/fba9cc8466dccdcd1f6f372ea7962e7ae9e09be1. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-struc

[SS] New numSavedStates metric for StateStoreRestoreExec for saved state?

2017-09-01 Thread Jacek Laskowski
://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L206 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering

[SS] Writing a test for a possible bug in StateStoreSaveExec with Append output mode?

2017-09-03 Thread Jacek Laskowski
very close to a test and that I could use? Thanks for any help you may offer! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow

Re: [SS] Writing a test for a possible bug in StateStoreSaveExec with Append output mode?

2017-09-04 Thread Jacek Laskowski
state for the key? Example's coming up. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitte

[SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

2017-09-09 Thread Jacek Laskowski
6fc6cf88d871f5b05b0ad1a504e0d6213cf9d331#diff-6532dd3b63bdab0364fbcf2303e290e4R294 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Fo

Re: [SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

2017-09-10 Thread Jacek Laskowski
Hi, Please disregard my finding. It does not seem a bug, but just a small "dead code" as "init" will never be displayed in web UI = the minimum batch id can ever be 0 and so getBatchDescriptionString could be a little "improved". Sorry for the noise. Pozdrawi

Re: A little Scala 2.12 help

2017-09-19 Thread Jacek Laskowski
Hi, Nice catch, Sean! Learnt this today. They did say you could learn a lot with Spark! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering

Re: Building with sbt "impossible to get artifacts when data has not been loaded"

2015-08-27 Thread Jacek Laskowski
warn] * com.google.code.findbugs:jsr305:1.3.9 -> 2.0.1 [warn] * com.google.guava:guava:11.0.2 -> 14.0.1 [warn] Run 'evicted' to see detailed eviction warnings [success] Total time: 3 s, completed Aug 27, 2015 11:58:18 AM Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | htt

Re: Master build fails ?

2015-11-03 Thread Jacek Laskowski
] ➜ spark git:(master) ✗ java -version java version "1.8.0_66" Java(TM) SE Runtime Environment (build 1.8.0_66-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) I'm on Mac OS. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.

Re: Master build fails ?

2015-11-04 Thread Jacek Laskowski
Hi, It appears it's time to switch to my lovely sbt then! Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Tue, Nov 3, 2015 at

Build fails due to...multiple overloaded alternatives of constructor RDDInfo define default arguments?

2015-11-07 Thread Jacek Laskowski
, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski - To unsubscribe, e-mail: dev-unsubscr

Re: Build fails due to...multiple overloaded alternatives of constructor RDDInfo define default arguments?

2015-11-07 Thread Jacek Laskowski
Worked for me. Thanks! Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Sat, Nov 7, 2015 at 1:56 PM, Ted Yu wrote: > Created a PR

Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Jacek Laskowski
tions and do whatever you want (log to stdout or whatever)? Just a thought. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jace

Re: Bringing up JDBC Tests to trunk

2015-12-01 Thread Jacek Laskowski
On Mon, Nov 30, 2015 at 10:53 PM, Josh Rosen wrote: > In SBT, these wind up on the Docker JDBC tests' classpath as a transitive > dependency of the `spark-sql` test JAR. However, what we should be doing is > adding them as explicit test dependencies of the `docker-integration-tests` > subproject,

A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Jacek Laskowski
the workers "./sbin/start-slave.sh spark://localhost:7077". p.s. Are such questions appropriate for this mailing list? Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spar

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Jacek Laskowski
Hi, I'm on yesterday's master HEAD. Pozdrawiam, Jacek -- Jacek Laskowski | https://medium.com/@jaceklaskowski/ | http://blog.jaceklaskowski.pl Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski Upvo

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Jacek Laskowski
On Thu, Dec 10, 2015 at 8:10 PM, Shixiong Zhu wrote: > Jacek, could you create a JIRA for it? I just reproduced it. It's a bug in > how Master handles the Worker disconnection. Hi Shixiong, I'm saved. Kept thinking I'm lost in the sources and see ghosts :-) https://issues.apache.org/jira/browse

[DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Jacek Laskowski
awiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski - To unsubscribe, e-m

Re: BUILD FAILURE for Scala 2.11?

2016-01-06 Thread Jacek Laskowski
Hi, Done. See https://github.com/apache/spark/pull/10636 Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/jaceklaskowski On Thu, Jan 7, 2016 at 8:10

NoClassDefFoundError when starting standalone Master

2016-01-09 Thread Jacek Laskowski
ClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 15 more Pozdrawiam, Jacek Jacek L

Re: NoClassDefFoundError when starting standalone Master

2016-01-09 Thread Jacek Laskowski
Hi, I think the change is related: https://github.com/apache/spark/commit/659fd9d04b988d48960eac4f352ca37066f43f5c as it touches the dependency in pom.xml. Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks

Re: NoClassDefFoundError when starting standalone Master

2016-01-09 Thread Jacek Laskowski
Figured it out and reported https://issues.apache.org/jira/browse/SPARK-12736. Fix's coming... Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twi

Re: NoClassDefFoundError when starting standalone Master

2016-01-09 Thread Jacek Laskowski
Hi, https://github.com/apache/spark/pull/10674 Please review and merge at your convenience. Thanks! Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.

Re: [discuss] dropping Python 2.6 support

2016-01-09 Thread Jacek Laskowski
On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen wrote: > (For similar reasons I personally don't favor supporting Java 7 or > Scala 2.10 in Spark 2.x.) That reflects my sentiments as well. Thanks Sean for bringing that up! Jacek - T

BUILD FAILURE...again?! :( Spark Project External Flume on fire

2016-01-10 Thread Jacek Laskowski
s] [INFO] Spark Project External Flume ... FAILURE [ 1.010 s] [INFO] Spark Project External Flume Assembly .. SKIPPED [1] https://github.com/apache/spark/commit/3ab0138b0fe0f9208b4b476855294a7c729583b7 Pozdrawiam, Jacek Jacek Laskowski | https://medium.com

Re: BUILD FAILURE...again?! :( Spark Project External Flume on fire

2016-01-11 Thread Jacek Laskowski
Thanks Josh. Leaving now so I can't give it a shot, but will report results in a couple of hours. Thanks a lot! Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me

Re: BUILD FAILURE...again?! :( Spark Project External Flume on fire

2016-01-11 Thread Jacek Laskowski
Hi, I've just git pull and it worked for me. Looks like https://github.com/apache/spark/commit/f13c7f8f7dc8766b0a42406b5c3639d6be55cf33 fixed the issue (or something in-between). Thanks for such a quick fix! p.s. Had time for swimming :-) Pozdrawiam, Jacek Jacek Laskowski | https://mediu

BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Jacek Laskowski
Test Tags FAILURE [ 0.321 s] Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/ Follow me at https://twitter.com/j

Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Jacek Laskowski
On Wed, Jan 20, 2016 at 8:48 PM, Marcelo Vanzin wrote: > On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski wrote: >> /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes... >> [error] Cannot run program "javac": error=2, No such file or directory > > That do

Re: BUILD FAILURE at Spark Project Test Tags for 2.11.7?

2016-01-20 Thread Jacek Laskowski
On Wed, Jan 20, 2016 at 8:48 PM, Marcelo Vanzin wrote: > On Wed, Jan 20, 2016 at 11:46 AM, Jacek Laskowski wrote: >> /Users/jacek/dev/oss/spark/tags/target/scala-2.11/classes... >> [error] Cannot run program "javac": error=2, No such file or directory > > That do

<    1   2   3   4   >