Help with basic stream action

2018-10-30 Thread Basil Hariri
Hi all, Sorry if this isn't the right place to ask basic questions, but I'm at the end of my rope here - please let me know where else I can get help if this isn't the right place. I'm trying to continuously read from a Kafka topic and send the number of rows Spark has received to a metric

Re: DataSourceV2 hangouts sync

2018-10-30 Thread Wenchen Fan
Hi all, I spent some time thinking about the roadmap, and came up with an initial list: SPARK-25390: data source V2 API refactoring SPARK-24252: add catalog support SPARK-25531: new write APIs for data source v2 SPARK-25190: better operator pushdown API Streaming rate control API Custom metrics

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
OK thanks for clarifying. I guess it is one of major features in streaming area and nice to add, but also agree it would require huge investigation. 2018년 10월 31일 (수) 오전 8:06, Michael Armbrust 님이 작성: > Agree. Just curious, could you explain what do you mean by "negation"? >> Does it mean

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Michael Armbrust
> > Agree. Just curious, could you explain what do you mean by "negation"? > Does it mean applying retraction on aggregated? > Yeah exactly. Our current streaming aggregation assumes that the input is in append-mode and multiple aggregations break this.

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
Thanks Micheal for explaining activity on SS as well as giving opinion on some items! Replying inline. 2018년 10월 31일 (수) 오전 5:44, Michael Armbrust 님이 작성: > Thanks for bringing up some possible future directions for streaming. Here > are some thoughts: > - I personally view all of the activity

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread Ryan Blue
+1 On Tue, Oct 30, 2018 at 4:42 AM Wenchen Fan wrote: > Thanks for reporting the bug! I'll list it as a known issue for 2.4.0 > > I'm adding my own +1, since all the known blockers are resolved. > > On Tue, Oct 30, 2018 at 2:56 PM Xiao Li wrote: > >> Yes, this is not a blocker. >>

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Stavros Kontopoulos
@Michael any update about queryable state? Stavros On Tue, Oct 30, 2018 at 10:43 PM, Michael Armbrust wrote: > Thanks for bringing up some possible future directions for streaming. Here > are some thoughts: > - I personally view all of the activity on Spark SQL also as activity on >

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Michael Armbrust
Thanks for bringing up some possible future directions for streaming. Here are some thoughts: - I personally view all of the activity on Spark SQL also as activity on Structured Streaming. The great thing about building streaming on catalyst / tungsten is that continued improvement to these

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

2018-10-30 Thread Kazuaki Ishizaki
Hi Reynold, Thank you for your comments. They are great points. 1) Yes, it is not easy to design the expressive and enough IR. We can learn concepts from good examples like HyPer, Weld, and others. They are expressive and not complicated. The detail cannot be captured yet, 2) To introduce

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread Wenchen Fan
Thanks for reporting the bug! I'll list it as a known issue for 2.4.0 I'm adding my own +1, since all the known blockers are resolved. On Tue, Oct 30, 2018 at 2:56 PM Xiao Li wrote: > Yes, this is not a blocker. > "spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by >

Why does spark.range(1).write.mode("overwrite").saveAsTable("t1") throw an Exception?

2018-10-30 Thread Jacek Laskowski
Hi, Just ran into it today and wonder whether it's a bug or something I may have missed before. scala> spark.version res21: String = 2.3.2 // that's OK scala> spark.range(1).write.saveAsTable("t1") org.apache.spark.sql.AnalysisException: Table `t1` already exists.; at

Re: Some PRs not automatically linked to JIRAs

2018-10-30 Thread Hyukjin Kwon
Duplicated link problem looks still persistent: https://issues.apache.org/jira/browse/SPARK-25881 https://issues.apache.org/jira/browse/SPARK-25880 I suspect if there are two places that runs this script. Not a big deal but people that can fix this are specific. I am leaving another reminder

Re: Plan on Structured Streaming in next major/minor release?

2018-10-30 Thread Jungtaek Lim
Adding more: again, it doesn't mean they're feasible to do. Just a kind of brainstorming. * SPARK-20568: Delete files after processing in structured streaming * There hasn't been consensus regarding supporting this: there were voices for both YES and NO. * Support multiple levels of

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread Xiao Li
Yes, this is not a blocker. "spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by default. As DB Tsai said, column pruning of nested schema for Parquet tables is experimental. In this release, we encourage the whole community to try this new feature but it might have bugs like

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread DB Tsai
+0 I understand that schema pruning is an experimental feature in Spark 2.4, and this can help a lot in read performance as people are trying to keep the hierarchical data in nested format. We just found a serious bug---it could fail parquet reader if a nested field and top level field are