Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Burak Yavuz
+1. Excited to see more stateful workloads with Structured Streaming! Best, Burak On Wed, Jan 10, 2024 at 8:21 AM Praveen Gattu wrote: > +1. This brings Structured Streaming a good solution for customers wanting > to build stateful stream processing applications. > > On Wed, Jan 10, 2024 at

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Burak Yavuz
I'm also a +1 on the newer APIs. We had a lot of learnings from using flatMapGroupsWithState and I believe that we can make the APIs a lot easier to use. On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Burak Yavuz
+1 on adding to Spark. Community involvement will make the XML reader better. Best, Burak On Wed, Jul 19, 2023 at 3:25 AM Martin Andersson wrote: > Alright, makes sense to add it then. > -- > *From:* Hyukjin Kwon > *Sent:* Wednesday, July 19, 2023 11:01 > *To:*

Re: SPIP: Catalog API for view metadata

2020-08-13 Thread Burak Yavuz
My high level comment here is that as a naive person, I would expect a View to be a special form of Table that SupportsRead but doesn't SupportWrite. loadTable in the TableCatalog API should load both tables and views. This way you avoid multiple RPCs to a catalog or data source or metastore, and

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Burak Yavuz
+1 Best, Burak On Tue, Jun 9, 2020 at 1:48 PM Shixiong(Ryan) Zhu wrote: > +1 (binding) > > Best Regards, > Ryan > > > On Tue, Jun 9, 2020 at 4:24 AM Wenchen Fan wrote: > >> +1 (binding) >> >> On Tue, Jun 9, 2020 at 6:15 PM Dr. Kent Yao wrote: >> >>> +1 (non-binding) >>> >>> >>> >>> -- >>>

Re: [DISCUSS] "complete" streaming output mode

2020-05-21 Thread Burak Yavuz
Oh wow. I never thought this would be up for debate. I use complete mode VERY frequently for all my dashboarding use cases. Here are some of my thoughts: > 1. It destroys the purpose of watermark and forces Spark to maintain all of state rows, growing incrementally. It only works when all keys

Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-20 Thread Burak Yavuz
Hey Russell, Great catch on the documentation. It seems out of date. I honestly am against having different DataSources having different default SaveModes. Users will have no clue if a DataSource implementation is V1 or V2. It seems weird that the default value can change for something that I

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Burak Yavuz
+1 On Mon, Mar 9, 2020 at 4:55 PM Reynold Xin wrote: > +1 > > > > On Mon, Mar 09, 2020 at 3:53 PM, John Zhuge wrote: > >> +1 (non-binding) >> >> On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer wrote: >> >>> +1 (non-binding) >>> >>> I am disappointed however that this only mentions API and not

Re: Issues with Delta Lake on 3.0.0 preview + preview 2

2019-12-30 Thread Burak Yavuz
I can't imagine any Spark data source using Spark internals compiled on Spark 2.4 working on 3.0 out of the box. There are many braking changes. I'll try to get a *dev* branch for 3.0 soon (mid Jan). Best, Burak On Mon, Dec 30, 2019, 8:53 AM Jean-Georges Perrin wrote: > Hi there, > > Trying to

Re: Static partitioning in partitionBy()

2019-05-07 Thread Burak Yavuz
It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread Burak Yavuz
Congrats Jose! On Tue, Jan 29, 2019 at 10:50 AM Xiao Li wrote: > Congratulations! > > Xiao > > Shixiong Zhu 于2019年1月29日周二 上午10:48写道: > >> Hi all, >> >> The Apache Spark PMC recently added Jose Torres as a committer on the >> project. Jose has been a major contributor to Structured Streaming.

Re: [SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-25 Thread Burak Yavuz
Probably just oversight. Anyone is welcome to add it :) On Sun, Nov 25, 2018 at 8:55 AM Jacek Laskowski wrote: > Hi, > > Why is FlatMapGroupsWithStateExec not measuring the time taken on state > commit [1](like StreamingDeduplicateExec [2] and StreamingGlobalLimitExec > [3])? Is this on

Re: Structured Streaming with Watermark

2018-10-18 Thread Burak Yavuz
Hi Sandeep, Watermarks are used in aggregation queries to ensure correctness and clean up state. They don't allow you to drop records in map-only scenarios, which you have in your example. If you would do a test of `groupBy().count()` then you will see that the count doesn't increase with the

Re: Welcoming some new committers

2018-03-04 Thread Burak Yavuz
Congrats all! Well deserved. On Sat, Mar 3, 2018 at 4:10 AM, Marco Gaido wrote: > Congratulations to you all! > > On 3 Mar 2018 8:30 a.m., "Liang-Chi Hsieh" wrote: > >> >> Congrats to everyone! >> >> >> Kazuaki Ishizaki wrote >> > Congratulations to

Re: queryable state & streaming

2017-12-08 Thread Burak Yavuz
Hi Stavros, Queryable state is definitely on the roadmap! We will revamp the StateStore API a bit, and a queryable StateStore is definitely one of the things we are thinking about during that revamp. Best, Burak On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" wrote: >

Re: Reload some static data during struct streaming

2017-11-13 Thread Burak Yavuz
I think if you don't cache the jdbc table, then it should auto-refresh. On Mon, Nov 13, 2017 at 1:21 PM, spark receiver wrote: > Hi > > I’m using struct streaming(spark 2.2) to receive Kafka msg ,it works > great. The thing is I need to join the Kafka message with a

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-04 Thread Burak Yavuz
+1 On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan wrote: > +1 > > On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu > wrote: > >> +1. >> >> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia >> wrote: >> >>> +1 from me too. >>> >>>

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-13 Thread Burak Yavuz
Congrats Takuya! On Mon, Feb 13, 2017 at 2:17 PM, Dilip Biswal wrote: > Congratulations, Takuya! > > Regards, > Dilip Biswal > Tel: 408-463-4980 <(408)%20463-4980> > dbis...@us.ibm.com > > > > - Original message - > From: Takeshi Yamamuro >

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Burak Yavuz
Thank you very much everyone! Hoping to help out the community as much as I can! Best, Burak On Tue, Jan 24, 2017 at 2:29 PM, Jacek Laskowski wrote: > Wow! At long last. Congrats Burak and Holden! > > p.s. I was a bit worried that the process of accepting new committers > is

Re: [SQL][SPARK-14160] Maximum interval for o.a.s.sql.functions.window

2017-01-18 Thread Burak Yavuz
Hi Maciej, I believe it would be useful to either fix the documentation or fix the implementation. I'll leave it to the community to comment on. The code right now disallows intervals provided in months and years, because they are not a "consistently" fixed amount of time. A month can be 28, 29,

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Burak Yavuz
+1 On Sep 29, 2016 4:33 PM, "Kyle Kelley" wrote: > +1 > > On Thu, Sep 29, 2016 at 4:27 PM, Yin Huai wrote: > >> +1 >> >> On Thu, Sep 29, 2016 at 4:07 PM, Luciano Resende >> wrote: >> >>> +1 (non-binding) >>> >>> On Wed, Sep 28,

Re: Spark SQL JSON Column Support

2016-09-28 Thread Burak Yavuz
I would really love something like this! It would be great if it doesn't throw away corrupt_records like the Data Source. On Wed, Sep 28, 2016 at 11:02 AM, Nathan Lande wrote: > We are currently pulling out the JSON columns, passing them through > read.json, and then

Re: Remove / update version in spark-packages.org

2016-07-26 Thread Burak Yavuz
Hi, It's bad practice to change jars for the same version and is prohibited in Spark Packages. Please bump your version number and make a new release. Best regards, Burak On Tue, Jul 26, 2016 at 3:51 AM, Julio Antonio Soto de Vicente < ju...@esbet.es> wrote: > Hi all, > > Maybe I am missing

Re: spark-packages with maven

2016-07-15 Thread Burak Yavuz
Hi Ismael and Jacek, If you use Maven for building your applications, you may use the spark-package command line tool ( https://github.com/databricks/spark-package-cmd-tool) to perform packaging. It requires you to build your jar using maven first, and then does all the extra magic that Spark

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-08 Thread Burak Yavuz
+1 On Tue, Mar 8, 2016 at 10:59 AM, Andrew Or wrote: > +1 > > 2016-03-08 10:59 GMT-08:00 Yin Huai : > >> +1 >> >> On Mon, Mar 7, 2016 at 12:39 PM, Reynold Xin wrote: >> >>> +1 (binding) >>> >>> >>> On Sun, Mar 6, 2016 at 12:08

Re: Spark not able to fetch events from Amazon Kinesis

2016-01-30 Thread Burak Yavuz
Hi Yash, I've run into multiple problems due to version incompatibilities, either due to protobuf or jackson. That may be your culprit. The problem is that all failures by the Kinesis Client Lib is silent, therefore don't show up on the logs. It's very hard to debug those buggers. Best, Burak

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread Burak Yavuz
Or you could also use reflection like in this Spark Package: https://github.com/brkyvz/lazy-linalg/blob/master/src/main/scala/com/brkyvz/spark/linalg/BLASUtils.scala Best, Burak On Mon, Nov 30, 2015 at 12:48 PM, DB Tsai wrote: > The workaround is have your code in the same

Re: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-03 Thread Burak Yavuz
+1. Tested complex R package support (Scala + R code), BLAS and DataFrame fixes good. Burak On Thu, Sep 3, 2015 at 8:56 AM, mkhaitman wrote: > Built and tested on CentOS 7, Hadoop 2.7.1 (Built for 2.6 profile), > Standalone without any problems. Re-tested dynamic

Re: FrequentItems in spark-sql-execution-stat

2015-08-01 Thread Burak Yavuz
Hi Yucheng, Thanks for pointing out the issue. You are correct, in the case that the final map is completely empty after the merge, we do need to add the final element to the map, with the correct count (decrement the count with the max count that was already in the map). I'll submit a fix for

Re: BlockMatrix multiplication

2015-07-17 Thread Burak Yavuz
shuffling given the blocks co-location? Best regards, Alexander *From:* Burak Yavuz [mailto:brk...@gmail.com] *Sent:* Wednesday, July 15, 2015 3:29 PM *To:* Ulanov, Alexander *Cc:* Rakesh Chalasani; dev@spark.apache.org *Subject:* Re: BlockMatrix multiplication Hi Alexander, I just

Re: BlockMatrix multiplication

2015-07-15 Thread Burak Yavuz
() - t) / 1e9) Best regards, Alexander *From:* Ulanov, Alexander *Sent:* Tuesday, July 14, 2015 6:24 PM *To:* 'Burak Yavuz' *Cc:* Rakesh Chalasani; dev@spark.apache.org *Subject:* RE: BlockMatrix multiplication Hi Burak, Thank you for explanation! I will try to make a diagonal

Re: BlockMatrix multiplication

2015-07-14 Thread Burak Yavuz
Hi Alexander, From your example code, using the GridPartitioner, you will have 1 column, and 5 rows. When you perform an A^T^A multiplication, you will generate a separate GridPartitioner with 5 columns and 5 rows. Therefore you are observing a huge shuffle. If you would generate a diagonal-block

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-09 Thread Burak Yavuz
+1 nonbinding. On Thu, Jul 9, 2015 at 7:38 AM, Sean Owen so...@cloudera.com wrote: +1 nonbinding. All previous RC issues appear resolved. All tests pass with the -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver invocation. Signatures et al are OK. On Thu, Jul 9, 2015 at 6:55 AM, Patrick

Re: [GraphX] Graph 500 graph generator

2015-06-24 Thread Burak Yavuz
Hi Ryan, If you can get past the paperwork, I'm sure this can make a great Spark Package (http://spark-packages.org). People then can use it for benchmarking purposes, and I'm sure people will be looking for graph generators! Best, Burak On Wed, Jun 24, 2015 at 7:55 AM, Carr, J. Ryan

Re: unsafe/compile error

2015-06-21 Thread Burak Yavuz
In addition, if you want to run a single suite, you may use: mllib/testOnly $SUITE_NAME with sbt. On Jun 21, 2015 10:32 AM, Burak Yavuz brk...@gmail.com wrote: You need to build an assembly jar for the cluster tests to pass. You may use 'sbt assembly/assembly'. Best, Burak On Jun 21, 2015 3

Re: unsafe/compile error

2015-06-21 Thread Burak Yavuz
You need to build an assembly jar for the cluster tests to pass. You may use 'sbt assembly/assembly'. Best, Burak On Jun 21, 2015 3:43 AM, acidghost andreajemm...@gmail.com wrote: After an sbt update the tests run. But all the cluster ones fail on task size should be small in both training and

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Burak Yavuz
+1 Tested on Mac OS X Burak On Thu, Jun 4, 2015 at 6:35 PM, Calvin Jia jia.cal...@gmail.com wrote: +1 Tested with input from Tachyon and persist off heap. On Thu, Jun 4, 2015 at 6:26 PM, Timothy Chen tnac...@gmail.com wrote: +1 Been testing cluster mode and client mode with mesos with

Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Burak Yavuz
Hi Marcelo, This is interesting. Can you please send me links to any failing builds if you see that problem please. For now you can set a conf: `spark.jars.ivy` to use a path except `~/.ivy2` for Spark. Thanks, Burak On Thu, Jun 4, 2015 at 4:29 AM, Sean Owen so...@cloudera.com wrote: I've

Re: Spark 2.0: Rearchitecting Spark for Mobile, Local, Social

2015-04-01 Thread Burak Yavuz
This is awesome! I can write the apps for it, to make the Web UI more functional! On Wed, Apr 1, 2015 at 12:37 AM, Tathagata Das tathagata.das1...@gmail.com wrote: This is a significant effort that Reynold has undertaken, and I am super glad to see that it's finally taking a concrete form.

Re: Which linear algebra interface to use within Spark MLlib?

2015-03-20 Thread Burak Yavuz
Hi, We plan to add a more comprehensive local linear algebra package for MLlib 1.4. This local linear algebra package can then easily be extended to BlockMatrix to support the same operations in a distributed fashion. You may find the JIRA to track this here: SPARK-6442

Re: [mllib] State of Multi-Model training

2014-09-16 Thread Burak Yavuz
Hi Kyle, I'm actively working on it now. It's pretty close to completion, I'm just trying to figure out bottlenecks and optimize as much as possible. As Phase 1, I implemented multi model training on Gradient Descent. Instead of performing Vector-Vector operations on rows (examples) and

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Burak Yavuz
+1. Tested MLlib algorithms on Amazon EC2, algorithms show speed-ups between 1.5-5x compared to the 1.0.2 release. - Original Message - From: Patrick Wendell pwend...@gmail.com To: dev@spark.apache.org Sent: Thursday, August 28, 2014 8:32:11 PM Subject: Re: [VOTE] Release Apache Spark

Re: Hello All

2014-08-05 Thread Burak Yavuz
Hi Guru, Take a look at: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark It has all the information you need on how to contribute to Spark. Also take a look at: https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

Re: 15 new MLlib algorithms

2014-07-09 Thread Burak Yavuz
Hi, The roadmap for the 1.1 release and MLLib includes algorithms such as: Non-negative matrix factorization, Sparse SVD, Multiclass decision tree, Random Forests (?) and optimizers such as: ADMM, Accelerated gradient methods also a statistical toolbox that includes: descriptive statistics,