Hi!

I would like to use tensorframes in my pyspark notebook.

I have performed the following:

1. In the spark intepreter adde a new repository
http://dl.bintray.com/spark-packages/maven
2. in the spark interpreter added the
dependency databricks:tensorframes:0.2.9-s_2.11
3. pip install tensorframes


In both 0.7.3 and 0.8.0:
1.  the following code resulted in error: "ImportError: No module named
tensorframes"

%pyspark
import tensorframes as tfs

2. the following code succeeded
%spark
import org.tensorframes.{dsl => tf}
import org.tensorframes.dsl.Implicits._
val df = spark.createDataFrame(Seq(1.0->1.1, 2.0->2.2)).toDF("a", "b")

// As in Python, scoping is recommended to prevent name collisions.
val df2 = tf.withGraph {
    val a = df.block("a")
    // Unlike python, the scala syntax is more flexible:
    val out = a + 3.0 named "out"
    // The 'mapBlocks' method is added using implicits to dataframes.
    df.mapBlocks(out).select("a", "out")
}

// The transform is all lazy at this point, let's execute it with collect:
df2.collect()

I ran the code above directly with spark interpreter with the default
configurations (master set up to local[*] - so not via spark-submit
command) .

Also, I have installed spark home locally and ran the command

$SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.9-s_2.11

and the code below worked as expcted

import tensorframes as tfs

 Can you please help to solve this?

Thanks,

 Florin

Reply via email to