SQL Visualization for cached Dataset

2018-01-02 Thread Tomasz Gawęda
Hi, Recently I had to optimize few Apache Spark SQL queries. Some of the Datasets were reused, so they were cached. However after caching I don't see SQL Visualization for the cached Dataset in Spark UI - I see only InMemoryRelation node. Explain result at the bottom of the page still has full

Broken SQL Visualization?

2018-01-15 Thread Tomasz Gawęda
Hi, today I have updated my test cluster to current Spark master, after that my SQL Visualization page started to crash with following error in JS: [cid:part1.DB2FB812.D25D60D1@outlook.com] Screenshot was cut for readability and to hide internal server names ;) It may be caused by upgrade or b

Re: Broken SQL Visualization?

2018-01-17 Thread Tomasz Gawęda
at 7:07 AM, Ted Yu mailto:yuzhih...@gmail.com>> wrote: Did you include any picture ? Looks like the picture didn't go thru. Please use third party site. Thanks Original message ---- From: Tomasz Gawęda mailto:tomasz.gaw...@outlook.com>> Date: 1/15/18 2:07

Dataset.localCheckpoint?

2018-01-22 Thread Tomasz Gawęda
Hi, Today I saw that there is no localCheckpoint() function in Dataset. Is there any reason for that? Checkpointing can truncate logical plans, but in some cases it's quite expensive to save whole Dataset on disk. Is there any workaround for this? Pozdrawiam / Best regards, Tomek Gawęda

Re: Dataset.localCheckpoint?

2018-01-23 Thread Tomasz Gawęda
Hi, sorry again, I was wrong - it was added in 2.3 by Fernando Pereira Pozdrawiam / Best regards, Tomek Gawęda On 2018-01-22 19:32, Tomasz Gawęda wrote: > Hi, > > Today I saw that there is no localCheckpoint() function in Dataset. Is > there any reason for that? Checkpointing

Re: eager execution and debuggability

2018-05-14 Thread Tomasz Gawęda
Hi, >I agree, it would be great if we could make the errors more clear about where >the error happened (user code or in Spark code) and what assumption was >violated. The problem is that this is a really hard thing to do generally, >like Reynold said. I think we should look for individual cases

Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
Hi, while working with JDBC datasource I saw that many "or" clauses with non-equality operators causes huge performance degradation of SQL query to database (DB2). For example: val df = spark.read.format("jdbc").(other options to parallelize load).load() df.where(s"(date1 > $param1 and (date1

Re: Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
, Wenchen On Tue, May 15, 2018 at 8:33 PM, Tomasz Gawęda mailto:tomasz.gaw...@outlook.com>> wrote: Hi, while working with JDBC datasource I saw that many "or" clauses with non-equality operators causes huge performance degradation of SQL query to database (DB2). For

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Tomasz Gawęda
Hi, what is the status of Continuous Processing + Aggregations? As far as I remember, Jose Torres said it should  be easy to perform aggregations if coalesce(1) work. IIRC it's already merged to master. Is this work in progress? If yes, it would be great to have full aggregation/join support i

Re: Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-29 Thread Tomasz Gawęda
Hi, Tweet linked on the issue suggests some Spark error, but I didn't dig into it to find root cause. At least, it's quite confusing behaviour Pozdrawiam/Best regards, Tomek 29.08.2018 6:44 PM Nicholas Chammas napisał(a): Dunno if I made a silly mistake, but I wanted to bring some attention to

Odp.: spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread Tomasz Gawęda
Hi, Please change Scala version to 2.11. As far as I know, Spark packages are now build with Scala 2.11 and I've got other - 2.10 - version Od: kevin Wysłane: 25 lipca 2016 11:33 Do: user.spark; dev.spark Temat: spark2.0 can't run SqlNetworkWordCount hi,all:

Real time streaming in Spark

2016-08-29 Thread Tomasz Gawęda
Hi everyone, I wonder if there are plans to implement real time streaming in Spark. I see that in Spark 2.0 Trigger can have more implementations than ProcessingTime. In my opinion Real Time streaming (so reaction on every event - like continous queries in Apache Ignite) will be very useful a

Re: Contribution to Apache Spark

2016-09-03 Thread Tomasz Gawęda
Hi, Contribution rules are described here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Pozdrawiam / Best regards, Tomek Gawęda W dniu 2016-09-03 o 21:58, aditya1702 pisze: Hello, I am Aditya Vyas and I am currently in my third year of college doing BTech in my engi

Re: Spark Improvement Proposals

2016-10-16 Thread Tomasz Gawęda
Hi everyone, I'm quite late with my answer, but I think my suggestions may help a little bit. :) Many technical and organizational topics were mentioned, but I want to focus on these negative posts about Spark and about "haters" I really like Spark. Easy of use, speed, very good community - it'

Odp.: Spark Improvement Proposals

2016-10-17 Thread Tomasz Gawęda
StackOverflow or other ways) Pozdrawiam / Best regards, Tomasz Od: Cody Koeninger Wysłane: 17 października 2016 16:46 Do: Debasish Das DW: Tomasz Gawęda; dev@spark.apache.org Temat: Re: Spark Improvement Proposals I think narrowly focusing on Flink or benchmarks is miss