Questions about Stateful Operations in SS

2017-07-26 Thread Zhang, Lubo
Hi all I have a question about the Stateful operations [map/flatmap]GroupsWithState in Structured streaming. Issue are as follows: Take StructuredSessionization case for example, first I input two words like apache and spark in batch 0, then input another word Hadoop in batch 1 until timeout

Re: Questions about Stateful Operations in SS

2017-07-26 Thread Tathagata Das
Hello Lubo, The idea of timeouts is to make a best-effort and last-resort effort to process a key, when it has not received data for a while. With processing time timeout is 1 minute, the system guarantees that it will not timeout unless at least 1 minute has passed. Defining a precise timing on

Re: Using UDFs in Java without registration

2017-07-26 Thread Justin Uang
Would like to bring this back for consideration again. I'm open to adding types for all the parameters, but it does seem onerous, and in the case of Python, we don't do that. Do you feel strongly about adding them? On Sat, May 30, 2015 at 8:04 PM Reynold Xin wrote: > We

Question on HashJoin trait

2017-07-26 Thread Chang Chen
Hi I am reading Spark SQL codes, what do streamedPlan and buildPlan of HashJoin trait for? protected lazy val (buildPlan, streamedPlan) = buildSide match { case BuildLeft => (left, right) case BuildRight => (right, left) }