Re: What are the alternatives to nested DataFrames?

2018-12-28 Thread Shahab Yunus
2 options I can think of: 1- Can you perform a union of dfs returned by elastic research queries. It would still be distributed but I don't know if you will run out of how many union operations you can perform at a time. 2- Can you used some other api method of elastic search other than which

RE: What are the alternatives to nested DataFrames?

2018-12-28 Thread email
I could , but only if I had it beforehand. I do not know what the dataframe is until I pass the query parameter and receive the resultant dataframe inside the iteration. The steps are : Original DF -> Iterate -> Pass every element to a function that takes the element of the original

Re:Re: Async action in Dataframe

2018-12-28 Thread 大啊
I check the `collect` of `DataSet`, this method call the `collect` of `RDD` and apply `decodeUnsafeRows`. So I think the function of the two `collect` is differenct. The `collect` of `DataSet` is used for spark sql. If you really want use `collectAsync`,please code following:

Re: What are the alternatives to nested DataFrames?

2018-12-28 Thread Andrew Melo
Could you join() the DFs on a common key? On Fri, Dec 28, 2018 at 18:35 wrote: > Shabad , I am not sure what you are trying to say. Could you please give > me an example? The result of the Query is a Dataframe that is created after > iterating, so I am not sure how could I map that to a column

RE: What are the alternatives to nested DataFrames?

2018-12-28 Thread email
Shabad , I am not sure what you are trying to say. Could you please give me an example? The result of the Query is a Dataframe that is created after iterating, so I am not sure how could I map that to a column without iterating and getting the values. I have a Dataframe that contains a

Re: jdbc spark streaming

2018-12-28 Thread Nicolas Paris
Hi The RDBMS context is quite broad: It has both large facts tables with billion rows as well as hundreds of small normalized tables. Depending on the spark transformation, the source data can be one or multiple tables, as well as few rows, million or even billion of them. When new data is

Re: jdbc spark streaming

2018-12-28 Thread Thakrar, Jayesh
Yes, you can certainly use spark streaming, but reading from the original source table may still be time consuming and resource intensive. Having some context on the RDBMS platform, data size/volumes involved and the tolerable lag (between changes being created and it being processed by Spark)

Re: What are the alternatives to nested DataFrames?

2018-12-28 Thread Shahab Yunus
Can you have a dataframe with a column which stores json (type string)? Or you can also have a column of array type in which you store all cities matching your query. On Fri, Dec 28, 2018 at 2:48 AM wrote: > Hi community , > > > > As shown in other answers online , Spark does not support the

Spark Kinesis Connector SSL issue

2018-12-28 Thread Shashikant Bangera
Hi Team, we are trying access the endpoint thought library mentioned below and we get the SSL error i think internally it use KCL library. so if I have to skip the certificate is it possible through KCL utils call ? because I do not find any provision to do that to set no-verify=false within