Hi All,
I am wondering if there a way to persist the lineages generated by spark
underneath? Some of our clients want us to prove if the result of the
computation that we are showing on a dashboard is correct and for that If
we can show the lineage of transformations that are executed to get to th
You probably don't want null safe equals (<=>) with a left join.
On Mon, Apr 3, 2017 at 5:46 PM gjohnson35
wrote:
> The join condition with && is throwing an exception:
>
> val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=>
> baseDF("medical_claim_id")
> && mccDF("medical_claim_det
The join condition with && is throwing an exception:
val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=>
baseDF("medical_claim_id")
&& mccDF("medical_claim_detail_id") <=>
baseDF("medical_claim_detail_id"), "left")
.join(revCdDF, revCdDF("revenue_code_padded_str") <=>
mccDF("
Yes, adding the timeout config should be the only code change required.
And just to clarify, this is for reconnecting with Mesos master (not
agents) after failover.
Tim
On Mon, Apr 3, 2017 at 2:23 PM, Charles Allen
wrote:
> We had investigated internally recently why restarting the mesos agents
We had investigated internally recently why restarting the mesos agents
failed the spark jobs (no real reason they should, right?) and came across
the data. The other conversation by Yu sparked trying to poke to get some
of the tickets updated to spread around any tribal knowledge that is
floating
The only reason is that MesosClusterScheduler by design is long
running so we really needed it to have failover configured correctly.
I wanted to create a JIRA ticket to allow users to configure it for
each Spark framework, but just didn't remember to do so.
Per another question that came up in t
As per https://issues.apache.org/jira/browse/SPARK-4899
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils#createSchedulerDriver
allows
checkpointing, but
only org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler uses
it. Is there a reason for that?
Hi,
I made some progress in binding the expressions to a LogicalPlan and then
analyzing the plan.
Problem is the Unique Id that are assigned to every expression.
def apply(dataFrame: DataFrame, selectExpressions:
java.util.List[String]): RDD[InternalRow] = {
val schema = dataFrame.schema
val