2 options I can think of:
1- Can you perform a union of dfs returned by elastic research queries. It
would still be distributed but I don't know if you will run out of how many
union operations you can perform at a time.
2- Can you used some other api method of elastic search other than which
ret
I could , but only if I had it beforehand. I do not know what the dataframe is
until I pass the query parameter and receive the resultant dataframe inside the
iteration.
The steps are :
Original DF -> Iterate -> Pass every element to a function that takes the
element of the original
I check the `collect` of `DataSet`, this method call the `collect` of `RDD` and
apply `decodeUnsafeRows`.
So I think the function of the two `collect` is differenct.
The `collect` of `DataSet` is used for spark sql.
If you really want use `collectAsync`,please code following:
`df.rdd.collectAsync`
Could you join() the DFs on a common key?
On Fri, Dec 28, 2018 at 18:35 wrote:
> Shabad , I am not sure what you are trying to say. Could you please give
> me an example? The result of the Query is a Dataframe that is created after
> iterating, so I am not sure how could I map that to a column w
Shabad , I am not sure what you are trying to say. Could you please give me an
example? The result of the Query is a Dataframe that is created after
iterating, so I am not sure how could I map that to a column without iterating
and getting the values.
I have a Dataframe that contains a list
Hi
The RDBMS context is quite broad: It has both large facts tables with
billion rows as well as hundreds of small normalized tables. Depending
on the spark transformation, the source data can be one or multiple
tables, as well as few rows, million or even billion of them. When new
data is inserte
Yes, you can certainly use spark streaming, but reading from the original
source table may still be time consuming and resource intensive.
Having some context on the RDBMS platform, data size/volumes involved and the
tolerable lag (between changes being created and it being processed by Spark)
Can you have a dataframe with a column which stores json (type string)? Or
you can also have a column of array type in which you store all cities
matching your query.
On Fri, Dec 28, 2018 at 2:48 AM wrote:
> Hi community ,
>
>
>
> As shown in other answers online , Spark does not support the n
Hi Team,
we are trying access the endpoint thought library mentioned below and we get
the SSL error i think internally it use KCL library. so if I have to skip the
certificate is it possible through KCL utils call ? because I do not find any
provision to do that to set no-verify=false within spa