Question regarding Projection PushDown

2021-08-27 Thread satyajit vegesna
Hi All, Please help with below question, I am trying to build my own data source to connect to CustomAerospike. Now I am almost done with everything, but still not sure how to implement Projection Pushdown while selecting nested columns. Spark does implicit for column projection pushdown, but

Spark error while trying to spark.read.json()

2017-12-19 Thread satyajit vegesna
Hi All, Can anyone help me with below error, Exception in thread "main" java.lang.AbstractMethodError at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:278) at org.apache.spark.sql.types.StructType.filterNot(StructType.scala:98) at

Access Array StructField inside StructType.

2017-12-12 Thread satyajit vegesna
Hi All, How to iterate over the StructField inside *after*, StructType(StructField(*after*,StructType(*StructField(Alarmed,LongType,true), StructField(CallDollarLimit,StringType,true), StructField(CallRecordWav,StringType,true), StructField(CallTimeLimit,LongType,true),

Joining streaming data with static table data.

2017-12-11 Thread satyajit vegesna
Hi All, I working on real time reporting project and i have a question about structured streaming job, that is going to stream a particular table records and would have to join to an existing table. Stream > query/join to another DF/DS ---> update the Stream data record. Now i have a

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread satyajit vegesna
--- >> https://about.me/JacekLaskowski >> Spark Structured Streaming https://bit.ly/spark-structured-streaming >> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> On Mon, Dec 11, 2017 at 9:15 AM, s

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread satyajit vegesna
help on how to approach the situation programmatically/any examples > pointed would highly be appreciated. > > Regards, > Satyajit. > > > > On Sun, Dec 10, 2017 at 9:52 PM, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> What about memory sink? Th

Re: Infer JSON schema in structured streaming Kafka.

2017-12-10 Thread satyajit vegesna
ory sink? That could work. > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jacekl

Infer JSON schema in structured streaming Kafka.

2017-12-10 Thread satyajit vegesna
Hi All, I would like to infer JSON schema from a sample of data that i receive from, Kafka Streams(specific topic), and i have to infer the schema as i am going to receive random JSON string with different schema for each topic, so i chose to go ahead with below steps, a. readStream from

RDD[internalRow] -> DataSet

2017-12-07 Thread satyajit vegesna
Hi All, Is there a way to convert RDD[internalRow] to Dataset , from outside spark sql package. Regards, Satyajit.

Re: Json Parsing.

2017-12-06 Thread satyajit vegesna
t; >> You can use get_json function >> >> On Thu, 7 Dec 2017 at 10:39 am, satyajit vegesna < >> satyajit.apas...@gmail.com> wrote: >> >>> Does spark support automatic detection of schema from a json string in a >>> dataframe. >>&

Json Parsing.

2017-12-06 Thread satyajit vegesna
Does spark support automatic detection of schema from a json string in a dataframe. I am trying to parse a json string and do some transofrmations on to it (would like append new columns to the dataframe) , from the data i stream from kafka. But i am not very sure, how i can parse the json in

Re: Spark Project build Issues.(Intellij)

2017-06-28 Thread satyajit vegesna
dongjoon.h...@gmail.com> wrote: > Did you follow the guide in `IDE Setup` -> `IntelliJ` section of > http://spark.apache.org/developer-tools.html ? > > Bests, > Dongjoon. > > On Wed, Jun 28, 2017 at 5:13 PM, satyajit vegesna < > satyajit.apas...@gmail.com> wrote: &

Spark Project build Issues.(Intellij)

2017-06-28 Thread satyajit vegesna
Hi All, When i try to build source code of apache spark code from https://github.com/apache/spark.git, i am getting below errors, Error:(9, 14) EventBatch is already defined as object EventBatch public class EventBatch extends org.apache.avro.specific.SpecificRecordBase implements

Re: Building Kafka 0.10 Source for Structured Streaming Error.

2017-06-28 Thread satyajit vegesna
OME/external/kafka-0-10-sql/target/*.jar". > > > i have tried building the jar with dependencies, but still face the same > error. > > What's the command you used? > > On Wed, Jun 28, 2017 at 12:00 PM, satyajit vegesna < > satyajit.apas...@gmail.com> wrote: > &

Building Kafka 0.10 Source for Structured Streaming Error.

2017-06-28 Thread satyajit vegesna
Hi All, I am trying too build Kafka-0-10-sql module under external folder in apache spark source code. Once i generate jar file using, build/mvn package -DskipTests -pl external/kafka-0-10-sql i get jar file created under external/kafka-0-10-sql/target. And try to run spark-shell with jars

Null pointer exception with RDD while computing a method, creating dataframe.

2016-12-20 Thread satyajit vegesna
Hi All, PFB sample code , val df = spark.read.parquet() df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data =

[no subject]

2016-12-20 Thread satyajit vegesna
Hi All, PFB sample code , val df = spark.read.parquet() df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data =

Document Similarity -Spark Mllib

2016-12-09 Thread satyajit vegesna
Hi ALL, I am trying to implement a mlllib spark job, to find the similarity between documents(for my case is basically home addess). i believe i cannot use DIMSUM for my use case as, DIMSUM is works well only with matrix with thin columns and more rows in matrix. matrix example format, for my

Issue in using DenseVector in RowMatrix, error could be due to ml and mllib package changes

2016-12-08 Thread satyajit vegesna
Hi All, PFB code. import org.apache.spark.ml.feature.{HashingTF, IDF} import org.apache.spark.ml.linalg.SparseVector import org.apache.spark.mllib.linalg.distributed.RowMatrix import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} /** * Created by satyajit

Re: Issues in compiling spark 2.0.0 code using scala-maven-plugin

2016-09-30 Thread satyajit vegesna
> > > i am trying to compile code using maven ,which was working with spark > 1.6.2, but when i try for spark 2.0.0 then i get below error, > > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on >

Issues in compiling spark 2.0.0 code using scala-maven-plugin

2016-09-29 Thread satyajit vegesna
Hi ALL, i am trying to compile code using maven ,which was working with spark 1.6.2, but when i try for spark 2.0.0 then i get below error, org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on project

Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-02 Thread satyajit vegesna
Hi All, I am trying to run a spark job using yarn, and i specify --executor-cores value as 20. But when i go check the "nodes of the cluster" page in http://hostname:8088/cluster/nodes then i see 4 containers getting created on each of the node in cluster. But can only see 1 vcore getting

HiveContext , difficulties in accessing tables in hive schema's/database's other than default database.

2016-07-19 Thread satyajit vegesna
Hi All, I have been trying to access tables from other schema's , apart from default , to pull data into dataframe. i was successful in doing it using the default schema in hive database. But when i try any other schema/database in hive, i am getting below error.(Have also not seen any examples

Fwd: Master options Cluster/Client descrepencies.

2016-03-29 Thread satyajit vegesna
Hi All, I have written a spark program on my dev box , IDE:Intellij scala version:2.11.7 spark verison:1.6.1 run fine from IDE, by providing proper input and output paths including master. But when i try to deploy the code in my cluster made of below, Spark

Master options Cluster/Client descrepencies.

2016-03-28 Thread satyajit vegesna
Hi All, I have written a spark program on my dev box , IDE:Intellij scala version:2.11.7 spark verison:1.6.1 run fine from IDE, by providing proper input and output paths including master. But when i try to deploy the code in my cluster made of below, Spark

Fwd: Apache Spark Exception in thread “main” java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

2016-03-19 Thread satyajit vegesna
Hi, Scala version:2.11.7(had to upgrade the scala verison to enable case clasess to accept more than 22 parameters.) Spark version:1.6.1. PFB pom.xml Getting below error when trying to setup spark on intellij IDE, 16/03/16 18:36:44 INFO spark.SparkContext: Running Spark version 1.6.1

Fwd: DF creation

2016-03-18 Thread satyajit vegesna
Hi , I am trying to create separate val reference to object DATA (as shown below), case class data(name:String,age:String) Creation of this object is done separately and the reference to the object is stored into val data. i use val samplerdd = sc.parallelize(Seq(data)) , to create RDD.

Data not getting printed in Spark Streaming with print().

2016-01-28 Thread satyajit vegesna
HI All, I am trying to run HdfsWordCount example from github. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala i am using ubuntu to run the program, but dont see any data getting printed after ,

Parquet SaveMode.Append Trouble.

2015-07-30 Thread satyajit vegesna
Hi, I am new to using Spark and Parquet files, Below is what i am trying to do, on Spark-shell, val df = sqlContext.parquetFile(/data/LM/Parquet/Segment/pages/part-m-0.gz.parquet) Have also tried below command, val