Converting Apache log string into map using delimiter
I have an RDD of logs that look like this: /no_cache/bi_event?Log=0pg_inst=517638988975678942pg=fow_mwever=c.2.1.8site=xyz.compid=156431807121222351rid=156431666543211500srch_id=156431666581865115row=6seq=1tot=1tsp=1cmp=thmb_12co_txt_url=Viewinget=clickthmb_type=pct=uc=579855lnx=SPGOOGBRANDCAMPref_url=http%3A%2F%2Fwww.abcd.com The pairs are separated by , and the keys/values of each pair are separated by =. Hive has a str_to_map function https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions that will convert this String to a map that will make the following work: mappedString[site] will return xyz.com What's the most efficient way to do this in Scala + Spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Converting-Apache-log-string-into-map-using-delimiter-tp18641.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Converting Apache log string into map using delimiter
OK I got it working with: z.map(row = (row.map(element = element.split(=)(0)) zip row.map(element = element.split(=)(1))).toMap) But I'm guessing there is a more efficient way than to create two separate lists and then zip them together and then convert the result into a map. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Converting-Apache-log-string-into-map-using-delimiter-tp18641p18643.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Converting Apache log string into map using delimiter
I think it would be faster/more compact as: z.map(_.map { element = val tokens = element.split(=) (tokens(0), tokens(1)) }.toMap) (That's probably 95% right but I didn't compile or test it.) On Wed, Nov 12, 2014 at 12:18 AM, YaoPau jonrgr...@gmail.com wrote: OK I got it working with: z.map(row = (row.map(element = element.split(=)(0)) zip row.map(element = element.split(=)(1))).toMap) But I'm guessing there is a more efficient way than to create two separate lists and then zip them together and then convert the result into a map. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Converting-Apache-log-string-into-map-using-delimiter-tp18641p18643.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org