Hi Andrey, Thanks a lots for your help. Unfortunately, i can not use case classes, because a schema information is only available at runtime; to make it more clear let me add more details. suppose that i have a very big data set (~500 Tb) which is stored in AWS s3 in a parquet format; Using spark, i can process (filter + join) it and reduce size down to ~200 -500 Gb; resulted dataset i would like to save in ignite cache using IgniteRdd and create indexes for a particular set of fields which will be used later for running queries (filter, join, aggregations); My assumption is that having this result dataset in ignite + indexes would help to improve the performance comparing to using spark DataFrame (persisted); Unfortunately, the resulted dataset schema can vary with great number of variations; So, it seems impossible to describe all of them with case classes; This is why an approach to store spark.sql.row + describe query fields and indexes using QueryEntity would be preferable; Thanks to your explanation, i see that this approach doesn't works; Another solutions that is spinning in my head is to generate case classes dynamically (at runtime) based on spark data frame schema, then map sql.rows to RDD[generated_case_class], describe ignite query and index fields using QueryEntity, create IgniteContext for generated case class; Im not sure that this approach is even possible, so i would like to ask for your opinion before i go deeper; Will be very grateful for advice
Best regards, Dmitry -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/index-and-query-org-apache-ignite-spark-IgniteRDD-String-org-apache-spark-sql-Row-tp3343p3363.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
