unsubscribe

2018-12-01 Thread Kappaganthu, Sivaram (CORP)


--
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-01 Thread Chris Teoh
Hi James,

Try flatMap (_.toList). See below example:-

scala> case class MyClass(i:Int)
defined class MyClass

scala> val r = 1 to 100
r: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)

scala> val r2 = 101 to 200
r2: scala.collection.immutable.Range.Inclusive = Range(101, 102, 103, 104,
105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
195, 196, 197, 198, 199, 200)

scala> val c1 = r.map(MyClass(_)).toIterable
c1: Iterable[MyClass] = Vector(MyClass(1), MyClass(2), MyClass(3),
MyClass(4), MyClass(5), MyClass(6), MyClass(7), MyClass(8), MyClass(9),
MyClass(10), MyClass(11), MyClass(12), MyClass(13), MyClass(14),
MyClass(15), MyClass(16), MyClass(17), MyClass(18), MyClass(19),
MyClass(20), MyClass(21), MyClass(22), MyClass(23), MyClass(24),
MyClass(25), MyClass(26), MyClass(27), MyClass(28), MyClass(29),
MyClass(30), MyClass(31), MyClass(32), MyClass(33), MyClass(34),
MyClass(35), MyClass(36), MyClass(37), MyClass(38), MyClass(39),
MyClass(40), MyClass(41), MyClass(42), MyClass(43), MyClass(44),
MyClass(45), MyClass(46), MyClass(47), MyClass(48), MyClass(49),
MyClass(50), MyClass(51), MyClass(52), MyClass(53), MyClass(54),
MyClass(55), MyClass(56), MyClass(57), MyClass(58), MyClass(59), MyClass(...

scala> val c2 = r2.map(MyClass(_)).toIterable
c2: Iterable[MyClass] = Vector(MyClass(101), MyClass(102), MyClass(103),
MyClass(104), MyClass(105), MyClass(106), MyClass(107), MyClass(108),
MyClass(109), MyClass(110), MyClass(111), MyClass(112), MyClass(113),
MyClass(114), MyClass(115), MyClass(116), MyClass(117), MyClass(118),
MyClass(119), MyClass(120), MyClass(121), MyClass(122), MyClass(123),
MyClass(124), MyClass(125), MyClass(126), MyClass(127), MyClass(128),
MyClass(129), MyClass(130), MyClass(131), MyClass(132), MyClass(133),
MyClass(134), MyClass(135), MyClass(136), MyClass(137), MyClass(138),
MyClass(139), MyClass(140), MyClass(141), MyClass(142), MyClass(143),
MyClass(144), MyClass(145), MyClass(146), MyClass(147), MyClass(148),
MyClass(149), MyClass(150), MyClass(151), MyClass(152), MyClass(153),
MyClass(154), MyClass(15...
scala> val rddIt = sc.parallelize(Seq(c1,c2))
rddIt: org.apache.spark.rdd.RDD[Iterable[MyClass]] =
ParallelCollectionRDD[2] at parallelize at :28

scala> rddIt.flatMap(_.toList)
res4: org.apache.spark.rdd.RDD[MyClass] = MapPartitionsRDD[3] at flatMap at
:26

res4 is what you're looking for.


On Sat, 1 Dec 2018 at 21:09, Chris Teoh  wrote:

> Do you have the full code example?
>
> I think this would be similar to the mapPartitions code flow, something
> like flatMap( _ =>  _.toList )
>
> I haven't yet tested this out but this is how I'd first try.
>
> On Sat, 1 Dec 2018 at 01:02, James Starks 
> wrote:
>
>> When processing data, I create an instance of RDD[Iterable[MyCaseClass]]
>> and I want to convert it to RDD[MyCaseClass] so that it can be further
>> converted to dataset or dataframe with toDS() function. But I encounter a
>> problem that SparkContext can not be instantiated within SparkSession.map
>> function because it already exists, even with allowMultipleContexts set to
>> true.
>>
>> val sc = new SparkConf()
>> sc.set("spark.driver.allowMultipleContexts", "true")
>> new SparkContext(sc).parallelize(seq)
>>
>> How can I fix this?
>>
>> Thanks.
>>
>
>
> --
> Chris
>


-- 
Chris


Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-01 Thread Chris Teoh
Do you have the full code example?

I think this would be similar to the mapPartitions code flow, something
like flatMap( _ =>  _.toList )

I haven't yet tested this out but this is how I'd first try.

On Sat, 1 Dec 2018 at 01:02, James Starks 
wrote:

> When processing data, I create an instance of RDD[Iterable[MyCaseClass]]
> and I want to convert it to RDD[MyCaseClass] so that it can be further
> converted to dataset or dataframe with toDS() function. But I encounter a
> problem that SparkContext can not be instantiated within SparkSession.map
> function because it already exists, even with allowMultipleContexts set to
> true.
>
> val sc = new SparkConf()
> sc.set("spark.driver.allowMultipleContexts", "true")
> new SparkContext(sc).parallelize(seq)
>
> How can I fix this?
>
> Thanks.
>


-- 
Chris