HashingTFModel/IDFModel in Structured Streaming

2017-10-16 Thread Davis Varghese
I have built a ML pipeline model on a static twitter data for sentiment analysis. When I use the model on a structured stream, it always throws "Queries with streaming sources must be executed with writeStream.start()". This particular model doesn't contain any documented "unsupported"

Is 2.2.1 going to be out soon?

2017-10-16 Thread Lalwani, Jayesh
We have one application that is running into problems because of https://issues.apache.org/jira/browse/SPARK-21696 which is released in 2.2.1. We would appreciate it if we have an idea when it’s going to be out The information contained

Graceful node decommission mechanism for Spark

2017-10-16 Thread Juan Rodríguez Hortalá
Hi all, I have a prototype for "Keep track of nodes which are going to be shut down & avoid scheduling new tasks" ( https://issues.apache.org/jira/browse/SPARK-20628) that I would like to discuss with the community. I added a WIP PR for that in https://github.com/apache/spark/pull/19267. The

Re: [VOTE][SPIP] SPARK-22026 data source v2 write path

2017-10-16 Thread Wenchen Fan
This vote passes with 3 binding +1 votes, 5 non-binding votes, and no -1 votes. Thanks all! +1 votes (binding): Wenchen Fan Reynold Xin Cheng Liang +1 votes (non-binding): Xiao Li Weichen Xu Vaquar khan Liwei Lin Dongjoon Hyun On Tue, Oct 17, 2017 at 12:30 AM, Dongjoon Hyun

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
Have you tried the 0.10 integration? I'm not sure how you would know whether a broker is up or down without attempting to connect to it. Do you have an alternative suggestion? Not sure how much interest there is in patches to the 0.8 integration at this point. On Mon, Oct 16, 2017 at 9:23 AM,

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Suprith T Jain
Yes I tried that. But it's not that effective. In fact kafka SimpleConsumer tries to reconnect in case of socket error (sendRequest method). So it ll always be twice the timeout for every window and for every node that is down. On 16-Oct-2017 7:34 PM, "Cody Koeninger"

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
Have you tried adjusting the timeout? On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain wrote: > Hi guys, > > I have a 3 node cluster and i am running a spark streaming job. consider the > below example > > /*spark-submit* --master yarn-cluster --class >

Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Suprith T Jain
Hi guys, I have a 3 node cluster and i am running a spark streaming job. consider the below example /*spark-submit* --master yarn-cluster --class com.huawei.bigdata.spark.examples.FemaleInfoCollectionPrint --jars

Re: [VOTE][SPIP] SPARK-22026 data source v2 write path

2017-10-16 Thread Cheng Lian
+1 On 10/12/17 20:10, Liwei Lin wrote: +1 ! Cheers, Liwei On Thu, Oct 12, 2017 at 7:11 PM, vaquar khan > wrote: +1 Regards, Vaquar khan On Oct 11, 2017 10:14 PM, "Weichen Xu"