;
>> Jerry
>>
>>
>>
>> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
>> *Sent:* Monday, October 27, 2014 4:07 PM
>>
>> *To:* Shao, Saisai
>> *Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
>> *Subject:* Re: RDD to DStre
;
> Jerry
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Monday, October 27, 2014 4:07 PM
>
> *To:* Shao, Saisai
> *Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
> *Subject:* Re: RDD to DStream
>
>
>
> Yeah, you&
@spark.apache.org; Tathagata Das (t...@databricks.com)
Subject: Re: RDD to DStream
Yeah, you're absolutely right Saisai.
My point is we should allow this kind of logic in RDD, let's say transforming
type RDD[(Key, Iterable[T])] to Seq[(Key, RDD[T])].
Make sense?
Jianshi
On Mon, Oct 27, 2014
> to execute in remote side, which obviously do not has SparkContext, I think
> Spark cannot support nested RDD in closure.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Monday, October 27, 2014 3:30 PM
>
>
hink Spark
cannot support nested RDD in closure.
Thanks
Jerry
From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Monday, October 27, 2014 3:30 PM
To: Shao, Saisai
Cc: user@spark.apache.org; Tathagata Das (t...@databricks.com)
Subject: Re: RDD to DStream
Ok, back to Scala code, I'm
but you cannot avoid scanning the whole data. Basically we need to avoid
>> fetching large amount of data back to driver.
>>
>>
>>
>>
>>
>> Thanks
>>
>> Jerry
>>
>>
>>
>> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.
rom:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Monday, October 27, 2014 2:39 PM
> *To:* Shao, Saisai
> *Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
>
> *Subject:* Re: RDD to DStream
>
>
>
> Hi Saisai,
>
>
>
> I understand it&
amount of data back to driver.
Thanks
Jerry
From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Monday, October 27, 2014 2:39 PM
To: Shao, Saisai
Cc: user@spark.apache.org; Tathagata Das (t...@databricks.com)
Subject: Re: RDD to DStream
Hi Saisai,
I understand it's non-trivial, bu
;
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Monday, October 27, 2014 1:42 PM
> *To:* Tathagata Das
> *Cc:* Aniket Bhatnagar; user@spark.apache.org
> *Subject:* Re: RDD to DStream
>
>
>
> I have a
...@gmail.com]
Sent: Monday, October 27, 2014 1:42 PM
To: Tathagata Das
Cc: Aniket Bhatnagar; user@spark.apache.org
Subject: Re: RDD to DStream
I have a similar requirement. But instead of grouping it by chunkSize, I would
have the timeStamp be part of the data. So the function I want has the
I have a similar requirement. But instead of grouping it by chunkSize, I
would have the timeStamp be part of the data. So the function I want has
the following signature:
// RDD of (timestamp, value)
def rddToDStream[T](data: RDD[(Long, T)], timeWindow: Long)(implicit ssc:
StreamingContext): D
Hey Aniket,
Great thoughts! I understand the usecase. But as you have realized yourself
it is not trivial to cleanly stream a RDD as a DStream. Since RDD
operations are defined to be scan based, it is not efficient to define RDD
based on slices of data within a partition of another RDD, using pure
The use case for converting RDD into DStream is that I want to simulate a
stream from an already persisted data for testing analytics. It is trivial
to create a RDD from any persisted data but not so much for DStream.
Therefore, my idea to create DStream from RDD. For example, lets say you
are tryi
Nice question :)
Ideally you should use a queuestream interface to push RDD into a queue &
then spark streaming can handle the rest.
Though why are you looking to convert RDD to DStream, another workaround
folks use is to source DStream from folders & move files that they need
reprocessed back into
Hi everyone
I haven't been receiving replies to my queries in the distribution list.
Not pissed but I am actually curious to know if my messages are actually
going through or not. Can someone please confirm that my msgs are getting
delivered via this distribution list?
Thanks,
Aniket
On 1 Augus
15 matches
Mail list logo