Hi Kay,
Thank you for the detailed explanation.
If I understand correctly, I *could* time each record processing time by
measuring the time in reader.next, but this would add overhead for every
single record. And this is the method that was abandoned because of
performance regressions.
The other
reminder: this is happening tomorrow morning!
7am PDT: builds paused
8am PDT: master reboot, upgrade happens
9am PDT: builds restarted
On Mon, May 9, 2016 at 4:17 PM, shane knapp wrote:
> reminder: this is happening thursday morning.
>
> On Wed, May 4, 2016 at 11:38 AM, shane knapp wrote:
Adding Kay
On Wed, May 11, 2016 at 12:01 PM, Brian Cho wrote:
> Hi,
>
> I'm interested in adding read-time (from HDFS) to Task Metrics. The
> motivation is to help debug performance issues. After some digging, its
> briefly mentioned in SPARK-1683 that this feature didn't make it due to
> metri
Dear Spark developers,
Recently, I was trying to switch my code from RDDs to DataFrames in order to
compare the performance. The code computes RDD in a loop. I use RDD.persist
followed by RDD.count to force Spark compute the RDD and cache it, so that it
does not need to re-compute it on each it
Hi,
I'm interested in adding read-time (from HDFS) to Task Metrics. The
motivation is to help debug performance issues. After some digging, its
briefly mentioned in SPARK-1683 that this feature didn't make it due to
metric collection causing a performance regression [1].
I'd like to try tackling
This may be related to: https://issues.apache.org/jira/browse/SPARK-13773
Regards,
James
On 11 May 2016 at 15:49, Ted Yu wrote:
> In master branch, behavior is the same.
>
> Suggest opening a JIRA if you haven't done so.
>
> On Wed, May 11, 2016 at 6:55 AM, Tony Jin wrote:
>
>> Hi guys,
>>
>>
In master branch, behavior is the same.
Suggest opening a JIRA if you haven't done so.
On Wed, May 11, 2016 at 6:55 AM, Tony Jin wrote:
> Hi guys,
>
> I have a problem about spark DataFrame. My spark version is 1.6.1.
> Basically, i used udf and df.withColumn to create a "new" column, and then
Hi guys,
I have a problem about spark DataFrame. My spark version is 1.6.1.
Basically, i used udf and df.withColumn to create a "new" column, and then
i filter the values on this new columns and call show(action). I see the
udf function (which is used to by withColumn to create the new column) is
Please see this thread:
http://search-hadoop.com/m/q3RTt9XAz651PiG/Adhoc+queries+spark+streaming&subj=Re+Adhoc+queries+on+Spark+2+0+with+Structured+Streaming
> On May 11, 2016, at 1:47 AM, Ofir Manor wrote:
>
> Hi,
> I'm trying out Structured Streaming from current 2.0 branch.
> Does the branch
Hi,
I'm trying out Structured Streaming from current 2.0 branch.
Does the branch currently support Kafka as either source or sink? I
couldn't find a specific JIRA or design doc for that in SPARK-8360 or in
the examples... Is it still targeted for 2.0?
Also, I naively assume it will look similar to
10 matches
Mail list logo