> How about something like
>
> scala> val text = (1 to 10).map(i => (i.toString,
> random_string(chars.mkString(""), 10))).toArray
>
> text: Array[(String, String)] = Array((1,FBECDoOoAC), (2,wvAyZsMZnt),
> (3,KgnwObOFEG), (4,tAZPRodrgP), (5,uSgrqyZGuc), (6,ztrTmbkOhO),
> (7,qUbQsKtZWq), (8,JDokbiFzWy), (9,vNHgiHSuUM), (10,CmnFjlHnHx))
>
> scala> sc.parallelize(text).count
> res0: Long = 10
>
> By the way not sure exactly why you need the udf registration here?
>
>
> On Tue, 23 Aug 2016 at 20:12 Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi gents,
>>
>> Well I was trying to see whether I can create an array of elements. From
>> RDD to DF, register as TempTable and store it  as a Hive table
>>
>> import scala.util.Random
>> //
>> // UDF to create a random string of charlength characters
>> //
>> def random_string(chars: String, charlength: Int) : String = {
>>   val newKey = (1 to charlength).map(
>>     x =>
>>     {
>>       val index = Random.nextInt(chars.length)
>>       chars(index)
>>     }
>>    ).mkString("")
>>    return newKey
>> }
>> spark.udf.register("random_string", random_string(_:String, _:Int))
>> case class columns (col1: Int, col2: String)
>> val chars = ('a' to 'z') ++ ('A' to 'Z')
>> var text = ""
>> val comma = ","
>> val terminator = "))"
>> var random_char = ""
>> for (i  <- 1 to 10) {
>>     random_char = random_string(chars.mkString(""), 10)
>> if (i < 10) {text = text + """(""" + i.toString +
>> """,""""+random_char+"""")"""+comma}
>>    else {text = text + """(""" + i.toString +
>> """,""""+random_char+"""")"""}
>> }
>> println(text)
>> val df = sc.parallelize((Array(text)))
>>
>>
>> Unfortunately that only sees it as the text and interprets it as text.
>>
>> I can write is easily as a shell script with ${text} passed to Array and
>> it will work. I was wondering if I could do this in Spark/Scala with my
>> limited knowledge
>>
>> Cheers
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 23 August 2016 at 19:00, Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>>> what is "text"? i.e. what is the "val text = ..." definition?
>>>
>>> If text is a String itself then indeed sc.parallelize(Array(text)) is
>>> doing the correct thing in this case.
>>>
>>>
>>> On Tue, 23 Aug 2016 at 19:42 Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> I am sure someone know this :)
>>>>
>>>> Created a dynamic text string which has format
>>>>
>>>> scala> println(text)
>>>>
>>>> (1,"hNjLJEgjxn"),(2,"lgryHkVlCN"),(3,"ukswqcanVC"),(4,"ZFULVxzAsv"),(5,"LNzOozHZPF"),(6,"KZPYXTqMkY"),(7,"DVjpOvVJTw"),(8,"LKRYrrLrLh"),(9,"acheneIPDM"),(10,"iGZTrKfXNr")
>>>>
>>>> now if I do
>>>>
>>>> scala> val df =
>>>> sc.parallelize((Array((1,"hNjLJEgjxn"),(2,"lgryHkVlCN"),(3,"ukswqcanVC"),(4,"ZFULVxzAsv"),(5,"LNzOozHZPF"),(6,"KZPYXTqMkY"),(7,"DVjpOvVJTw"),(8,"LKRYrrLrLh"),(9,"acheneIPDM"),(10,"iGZTrKfXNr"))))
>>>> df: org.apache.spark.rdd.RDD[(Int, String)] =
>>>> ParallelCollectionRDD[230] at parallelize at <console>:39
>>>> scala> df.count
>>>> res157: Long = 10
>>>> It shows ten Array elements, which is correct.
>>>>
>>>> Now if I pass that text into Array it only sees one row
>>>>
>>>> scala> val df = sc.parallelize((Array(text)))
>>>> df: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[228] at
>>>> parallelize at <console>:41
>>>> scala> df.count
>>>> res158: Long = 1
>>>>
>>>> Basically it sees it as one element of array
>>>>
>>>> scala> df.first
>>>> res165: String =
>>>> (1,"hNjLJEgjxn"),(2,"lgryHkVlCN"),(3,"ukswqcanVC"),(4,"ZFULVxzAsv"),(5,"LNzOozHZPF"),(6,"KZPYXTqMkY"),(7,"DVjpOvVJTw"),(8,"LKRYrrLrLh"),(9,"acheneIPDM"),(10,"iGZTrKfXNr")
>>>> Which is not what I want.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This works fine
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>
>>

Reply via email to