Guidance on Ignite-Spark SQL Functionality

sparkcraft Mon, 22 Feb 2016 17:42:13 -0800

Hello,I could use some assistance on testing out Ignite over Spark,
specifically when it comes to sql over RDD Objects.  I am able to load up an
IgniteRDD with tuples, and do some aggregations over it.  However, when I
try to invoke .sql() on the IgniteRDD, I am getting errors.Here is my test,
distilled down to some basics.Say I have my case class:
   package examples   case class MyObject(someint: Int, somestring: String)
And cache definition:
   &lt;property name=&quot;cacheConfiguration&quot;&gt;      &lt;list&gt;       
  
&lt;bean
class=&quot;org.apache.ignite.configuration.CacheConfiguration&quot;&gt;        
    
&lt;property name=&quot;name&quot; value=&quot;cache&quot;/&gt;            
&lt;property name=&quot;indexedTypes&quot;&gt;                &lt;list&gt;      
            
&lt;value&gt;org.apache.ignite.lang.IgniteUuid&lt;/value&gt;                  
&lt;value&gt;examples.MyObject&lt;/value&gt;                &lt;/list&gt;       
     
&lt;/property&gt;         &lt;/bean&gt;      &lt;/list&gt;  
&lt;/property&gt;
I am setting this indexedTypes up via XML since IgniteConfiguration is not
serializable, and therefore can't be done programatically in my Spark Job. 
So then I switch over to the job which is something (again distilled for
brevity) like such:
   import org.apache.ignite.spark.IgniteContext   import
org.apache.ignite.lang.IgniteUuid   import org.apache.ignite.configuration._  
import examples.MyObject   val igniteContext = new IgniteContext[IgniteUuid,
MyObject](sc, &quot;/usr/lib/ignite/config/default-config.xml&quot;, false)  
val cache = igniteContext.fromCache(&quot;cache&quot;)   val input =
igniteContext.sparkContext.textFile(&quot;hdfs://somefile&quot;)   val rdd =
input.map(s =&gt; s.split(&quot;\\|&quot;)).map(line =&gt; new MyObject(     
line(0).toInt,      line(1)   ))   cache.saveValues(rdd)
At this point I can interact with the cache as an RDD, but not through
IgniteRDD.sql.  I either get empty results, or match errors depending on my
declared key type. I have tried several techniques with different key types
and different tuples.Figuring I needed a schema, I change the cache
indexedTypes and try something like:
   import org.apache.spark.sql._   import org.apache.spark.sql.types._   val
schema = DataTypes.createStructType(Array(       
StructField(&quot;someid&quot;, StringType, true),       
StructField(&quot;somestring&quot;, StringType, false)   ))   val rows =
input.map(s =&gt; s.split(&quot;\\|&quot;)).map(line =&gt; {     
RowFactory.create(line(0),line(1))   })   val df =
igniteContext.sqlContext.createDataFrame(rows, schema)  
cache.saveValues(df.rdd)    val rs = cache.sql(&quot;select * from Row limit
5&quot;)   //rs.count=0
I have also tried .map() with a the cache config key type.  But in the end,
I still cannot get results out of the cache via IgniteRDD.sql.
Where am I going wrong?  I see items in the cache, and in the rdd.  take(n)
returns the expected objects.  But that is where my success ends.  If you
have any points on the object.fields -> table.columns that is a bit
frustrating too.
Hopefully the spark specific documentation thickens soon, I know it's new,
but for now I could use some tips!  Thanks!




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Guidance-on-Ignite-Spark-SQL-Functionality-tp3138.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Guidance on Ignite-Spark SQL Functionality

Reply via email to