Hi All,
Please help with below question,
I am trying to build my own data source to connect to CustomAerospike.
Now I am almost done with everything, but still not sure how to implement
Projection Pushdown while selecting nested columns.
Spark does implicit for column projection pushdown, but
Hi All,
Can anyone help me with below error,
Exception in thread "main" java.lang.AbstractMethodError
at
scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:278)
at org.apache.spark.sql.types.StructType.filterNot(StructType.scala:98)
at
Hi All,
How to iterate over the StructField inside *after*,
StructType(StructField(*after*,StructType(*StructField(Alarmed,LongType,true),
StructField(CallDollarLimit,StringType,true),
StructField(CallRecordWav,StringType,true),
StructField(CallTimeLimit,LongType,true),
Hi All,
I working on real time reporting project and i have a question about
structured streaming job, that is going to stream a particular table
records and would have to join to an existing table.
Stream > query/join to another DF/DS ---> update the Stream data record.
Now i have a
---
>> https://about.me/JacekLaskowski
>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> On Mon, Dec 11, 2017 at 9:15 AM, s
help on how to approach the situation programmatically/any examples
> pointed would highly be appreciated.
>
> Regards,
> Satyajit.
>
>
>
> On Sun, Dec 10, 2017 at 9:52 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> What about memory sink? Th
ory sink? That could work.
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jacekl
Hi All,
I would like to infer JSON schema from a sample of data that i receive
from, Kafka Streams(specific topic), and i have to infer the schema as i am
going to receive random JSON string with different schema for each topic,
so i chose to go ahead with below steps,
a. readStream from
Hi All,
Is there a way to convert RDD[internalRow] to Dataset , from outside spark
sql package.
Regards,
Satyajit.
t;
>> You can use get_json function
>>
>> On Thu, 7 Dec 2017 at 10:39 am, satyajit vegesna <
>> satyajit.apas...@gmail.com> wrote:
>>
>>> Does spark support automatic detection of schema from a json string in a
>>> dataframe.
>>&
Does spark support automatic detection of schema from a json string in a
dataframe.
I am trying to parse a json string and do some transofrmations on to it
(would like append new columns to the dataframe) , from the data i stream
from kafka.
But i am not very sure, how i can parse the json in
dongjoon.h...@gmail.com>
wrote:
> Did you follow the guide in `IDE Setup` -> `IntelliJ` section of
> http://spark.apache.org/developer-tools.html ?
>
> Bests,
> Dongjoon.
>
> On Wed, Jun 28, 2017 at 5:13 PM, satyajit vegesna <
> satyajit.apas...@gmail.com> wrote:
&
Hi All,
When i try to build source code of apache spark code from
https://github.com/apache/spark.git, i am getting below errors,
Error:(9, 14) EventBatch is already defined as object EventBatch
public class EventBatch extends org.apache.avro.specific.SpecificRecordBase
implements
OME/external/kafka-0-10-sql/target/*.jar".
>
> > i have tried building the jar with dependencies, but still face the same
> error.
>
> What's the command you used?
>
> On Wed, Jun 28, 2017 at 12:00 PM, satyajit vegesna <
> satyajit.apas...@gmail.com> wrote:
>
&
Hi All,
I am trying too build Kafka-0-10-sql module under external folder in apache
spark source code.
Once i generate jar file using,
build/mvn package -DskipTests -pl external/kafka-0-10-sql
i get jar file created under external/kafka-0-10-sql/target.
And try to run spark-shell with jars
Hi All,
PFB sample code ,
val df = spark.read.parquet()
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd
def comp(zipcode:String):Unit={
val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl",
zipcode)
val data =
Hi All,
PFB sample code ,
val df = spark.read.parquet()
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd
def comp(zipcode:String):Unit={
val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode)
val data =
Hi ALL,
I am trying to implement a mlllib spark job, to find the similarity between
documents(for my case is basically home addess).
i believe i cannot use DIMSUM for my use case as, DIMSUM is works well only
with matrix with thin columns and more rows in matrix.
matrix example format, for my
Hi All,
PFB code.
import org.apache.spark.ml.feature.{HashingTF, IDF}
import org.apache.spark.ml.linalg.SparseVector
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by satyajit
>
>
> i am trying to compile code using maven ,which was working with spark
> 1.6.2, but when i try for spark 2.0.0 then i get below error,
>
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on
>
Hi ALL,
i am trying to compile code using maven ,which was working with spark
1.6.2, but when i try for spark 2.0.0 then i get below error,
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on
project
Hi All,
I am trying to run a spark job using yarn, and i specify --executor-cores
value as 20.
But when i go check the "nodes of the cluster" page in
http://hostname:8088/cluster/nodes then i see 4 containers getting created
on each of the node in cluster.
But can only see 1 vcore getting
Hi All,
I have been trying to access tables from other schema's , apart from
default , to pull data into dataframe.
i was successful in doing it using the default schema in hive database.
But when i try any other schema/database in hive, i am getting below
error.(Have also not seen any examples
Hi All,
I have written a spark program on my dev box ,
IDE:Intellij
scala version:2.11.7
spark verison:1.6.1
run fine from IDE, by providing proper input and output paths including
master.
But when i try to deploy the code in my cluster made of below,
Spark
Hi All,
I have written a spark program on my dev box ,
IDE:Intellij
scala version:2.11.7
spark verison:1.6.1
run fine from IDE, by providing proper input and output paths including
master.
But when i try to deploy the code in my cluster made of below,
Spark
Hi,
Scala version:2.11.7(had to upgrade the scala verison to enable case
clasess to accept more than 22 parameters.)
Spark version:1.6.1.
PFB pom.xml
Getting below error when trying to setup spark on intellij IDE,
16/03/16 18:36:44 INFO spark.SparkContext: Running Spark version 1.6.1
Hi ,
I am trying to create separate val reference to object DATA (as shown
below),
case class data(name:String,age:String)
Creation of this object is done separately and the reference to the object
is stored into val data.
i use val samplerdd = sc.parallelize(Seq(data)) , to create RDD.
HI All,
I am trying to run HdfsWordCount example from github.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
i am using ubuntu to run the program, but dont see any data getting printed
after ,
Hi,
I am new to using Spark and Parquet files,
Below is what i am trying to do, on Spark-shell,
val df =
sqlContext.parquetFile(/data/LM/Parquet/Segment/pages/part-m-0.gz.parquet)
Have also tried below command,
val
29 matches
Mail list logo