There was a bug in the devices line: dh.index('id') should have been x[dh.index('id')]. ᐧ
On Fri, Oct 17, 2014 at 5:52 PM, Russell Jurney <russell.jur...@gmail.com> wrote: > Is that not exactly what I've done in j3/j4? The keys are identical > strings.The k is the same, the value in both instances is an associative > array. > > devices = devices.map(lambda x: (dh.index('id'), {'deviceid': > x[dh.index('id')], 'foo': x[dh.index('foo')], 'bar': x[dh.index('bar')]})) > bytes_in_out = transactions.map(lambda x: (x[th.index('deviceid')], > {'deviceid': x[th.index('deviceid')], > 'foo': x[th.index('foo')], > 'bar': x[th.index('bar')], > 'hello': x[th.index('hello')], > 'world': x[th.index('world')]})) > > j3 = bytes_in_out.join(devices, 10) > j3.take(1) > j4 = devices.join(bytes_int_out, 10) > j4.take(1) > > ᐧ > > On Fri, Oct 17, 2014 at 5:48 PM, Davies Liu <dav...@databricks.com> wrote: > >> Hey Russell, >> >> join() can only work with RDD of pairs (key, value), such as >> >> rdd1: (k, v1) >> rdd2: (k, v2) >> >> rdd1.join(rdd2) will be (k1, v1, v2) >> >> Spark SQL will be more useful for you, see >> http://spark.apache.org/docs/1.1.0/sql-programming-guide.html >> >> Davies >> >> >> On Fri, Oct 17, 2014 at 5:01 PM, Russell Jurney <russell.jur...@gmail.com >> > wrote: >> >>> https://gist.github.com/rjurney/fd5c0110fe7eb686afc9 >>> >>> Any way I try to join my data fails. I can't figure out what I'm doing >>> wrong. >>> >>> -- >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome >>> .com >>> ᐧ >>> >> >> > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. > com > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com