Hi,
I am a new to Scala and Spark and trying to find relative API in
DataFrame to solve my problem as title described. However, I just only find
this API DataFrame.col(colName : String) : Column which returns an object of
Column. Not the content. If only DataFrame support such API which
Hi,folks
I wrote some spark jobs and these jobs could ran successfully when I ran them
one by one. But if I ran them concurrently, for example 12 jobs parallel
running, I met the following error. Could anybody tell me what cause this? How
to solve it? Many Thanks!
Exception in thread "main"
You can use DF.groupBy(upper(col("a"))).agg(sum(col("b"))).
DataFrame provide function "upper" to update column to uppercase.
2015-12-24 20:47 GMT+08:00 Eran Witkon :
> Use DF.withColumn("upper-code",df("countrycode).toUpper))
> or just run a map function that does the same
>
> On Thu, Dec 24, 20
Hello,
I have a batch and a streaming driver using same functions (Scala). I use
accumulators (passed to functions constructors) to count stuff.
In the batch driver, doing so in the right point of the pipeline, I'm able
to retrieve the accumulator value and print it as log4j log.
In the streamin
No luck.
But two updates:
1. i have downloaded spark-1.4.1 and everything works fine, i dont see any
error
2. i have added the following XML file to spark's 1.5.2 conf directory and
now i got the following error
aused by: java.lang.RuntimeException: The root scratch dir:
c:/Users/marco/tmp on HDF
Problem must be with how I am converting JavaRDD> to a
DataFrame.
Any suggestions? Most of my work has been done using pySpark. Tuples are a
lot harder to work with in Java.
JavaRDD> predictions =
idLabeledPoingRDD.map((Tuple2 t2) -> {
Long id = t2._1();
LabeledPoint
Hi
Any idea how I can debug this problem. I suspect the problem has to do with
how I am converting a JavaRDD> to a DataFrame.
Is it boxing problem? I tried to use long and double instead of Long and
Double when ever possible.
Thanks in advance, Happy Holidays.
Andy
allData.printSchema()
root
We are using the older receiver based approach, the number of partitions is 1
(we have a single node kafka) and we use single thread per topic still we have
the problem. Please see the API we use. All 8 spark jobs use same group name –
is that a problem?
val topicMap = topics.split(",").map((_,
Hi,
To add to it, you can read about the native libs in
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html.
Pozdrawiam,
Jacek
Jacek Laskowski | https://medium.com/@jaceklaskowski/
Mastering Apache Spark
==> https://jaceklaskowski.gitbooks.io/mastering-a
You can safely ignore it. Native libs aren't set with HADOOP_HOME. See
Hadoop docs on how to configure this if you're curious, but you really
don't need to.
On Thu, Dec 24, 2015 at 12:19 PM, Bilinmek Istemiyor
wrote:
> Hello,
>
> I have apache spark 1.5.1 installed with the help of this user gro
anyone could help?
On Wed, Dec 23, 2015 at 1:40 PM, Li Li wrote:
> I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2.
> it throws exception in line: Matrix topics = ldaModel.topicsMatrix();
> But in yarn job history ui, it's successful. What's wrong with it?
> I submit job with
> .b
Use DF.withColumn("upper-code",df("countrycode).toUpper))
or just run a map function that does the same
On Thu, Dec 24, 2015 at 2:05 PM Bharathi Raja
wrote:
> Hi,
> Values in a dataframe column named countrycode are in different cases. Eg:
> (US, us). groupBy & count gives two rows but the requ
Hello,
I have apache spark 1.5.1 installed with the help of this user group. I
receive following error when I start pyshell
WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Later I have downloaded native binary from had
Thanks Eran, I'll check the solution.
Regards,
Raja
-Original Message-
From: "Eran Witkon"
Sent: 12/24/2015 4:07 PM
To: "Bharathi Raja" ; "Gokula Krishnan D"
Cc: "user@spark.apache.org"
Subject: Re: How to Parse & flatten JSON object in a text file using
Spark&Scala into Dataframe
Hi,
Values in a dataframe column named countrycode are in different cases. Eg: (US,
us). groupBy & count gives two rows but the requirement is to ignore case for
this operation.
1) Is there a way to ignore case in groupBy? Or
2) Is there a way to update the dataframe column countrycode to upperc
Are you using a direct stream consumer, or the older receiver based consumer?
If the latter, do the number of partitions you’ve specified for your topic
match the number of partitions in the topic on Kafka?
That would be an possible cause – as you might receive all data from a given
partition
You can send out pull request for the JIRA you're interested in.
Start the title of pull request with:
[SPARK-XYZ] ...
where XYZ is the JIRA number.
The pull request would be posted on the JIRA.
After pull request is reviewed, tested by QA and merged, the committer
would assign your name to the
Answered using StackOverflow. if you are looking for the solution:
This is the trick:
val jsonNested = sqlContext.read.json(jsonUnGzip.map{case
Row(cty:String, json:String,nm:String,yrs:String) => s"""{"cty":
\"$cty\", "extractedJson": $json , "nm": \"$nm\" , "yrs":
\"$yrs\"}"""})
See this link
Hi
>From the how to contribute page of spark jira project I came to know that I
can start by picking up the starter label bugs.
But who will assign me these bugs? Or should I just fix them and create a
pull request.
Will be glad to help the project.
--
View this message in context:
http://a
Mind providing a bit more detail ?
Release of Spark
version of Cassandra connector
How job was submitted
complete stack trace
Thanks
On Thu, Dec 24, 2015 at 2:06 AM, Vijay Kandiboyina wrote:
> java.lang.NoClassDefFoundError:
> com/datastax/spark/connector/rdd/CassandraTableScanRDD
>
>
raja! I found the answer to your question!
Look at
http://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes
this is what you (and I) was looking for.
general idea - you read the list as text where project Details is just a
string field and then you build the
Hi All,
We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our
spark streaming job(consumer) not receiving all the messages sent to the
specific topic. It receives 1 out of ~50 messages(added log in the job stream
and identified). We are not seeing any errors in the kaf
java.lang.NoClassDefFoundError:
com/datastax/spark/connector/rdd/CassandraTableScanRDD
Hi,
I have a JSON file with the following row format:
{"cty":"United
Kingdom","gzip":"H4sIAKtWystVslJQcs4rLVHSUUouqQTxQvMyS1JTFLwz89JT8nOB4hnFqSBxj/zS4lSF/DQFl9S83MSibKBMZVExSMbQwNBM19DA2FSpFgDvJUGVUw==","nm":"Edmund
lronside","yrs":"1016"}
The gzip field is a compressed JSON by itsel
You forgot a return statement in the 'else' clause, which is what the
compiler is telling you. There's nothing more to it here. Your
function is much simpler however as
Function checkHeaders2 = (x ->
x.startsWith("npi")||x.startsWith("CPT"));
On Thu, Dec 24, 2015 at 1:13 AM, rdpratti wrote:
> I
Would you mind posting the relevant code snippet?
Thanks
Best Regards
On Wed, Dec 23, 2015 at 7:33 PM, Vyacheslav Yanuk
wrote:
> Hi.
> I have very strange situation with direct reading from Kafka.
> For example.
> I have 1000 messages in Kafka.
> After submitting my application I read this data
26 matches
Mail list logo