Sai,
I am bit confused here.
How are you using write with results?
I am using spark 1.4.1 and when i use write , it complains about write not
being member of DataFrame.
error:value write is not a member of org.apache.spark.sql.DataFrame
Thanks
Deepak
On Mon, Nov 16, 2015 at 4:10 PM, 张炜
Hi
I am looking for any blog / doc on the developer's best practices if using
Spark .I have already looked at the tuning guide on spark.apache.org.
Please do let me know if any one is aware of any such resource.
Thanks
Deepak
Hi All
I am confused on RDD persistence in cache .
If I cache RDD , is it going to stay there in memory even if my spark
program completes execution , which created it.
If not , how can I guarantee that RDD is persisted in cache even after the
program finishes execution.
Thanks
Deepak
uot; <engr...@gmail.com> wrote:
> The cache gets cleared out when the job finishes. I am not aware of a way
> to keep the cache around between jobs. You could save it as an object file
> to disk and load it as an object file on your next job for speed.
> On Thu, Nov 5, 2015 at 6:1
Hi All
Sorry for spamming your inbox.
I am really keen to work on a big data project full time(preferably remote
from India) , if not I am open to volunteering as well.
Please do let me know if there is any such opportunity available
--
Thanks
Deepak
An approach I can think of is using Ambari Metrics Service(AMS)
Using these metrics , you can decide upon if the cluster is low in
resources.
If yes, call the Ambari management API to add the node to the cluster.
Thanks
Deepak
On Mon, Dec 14, 2015 at 2:48 PM, cs user
I have never tried this but there is yarn client api's that you can use in
your spark program to get the application id.
Here is the link to the yarn client java doc:
http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/yarn/client/api/YarnClient.html
getApplications() is the method for your
Yes , you can do it unless the method is marked static/final.
Most of the methods in SparkContext are marked static so you can't over
ride them definitely , else over ride would work usually.
Thanks
Deepak
On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman wrote:
>
Invalid jobj 2. If SparkR was restarted, Spark operations need to be
> re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
>> Hi Sandee
Hi Sandeep
I am not sure if ORC can be read directly in R.
But there can be a workaround .First create hive table on top of ORC files
and then access hive table in R.
Thanks
Deepak
On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana
wrote:
> Hello
>
> I need to read an ORC
I am not sure if Spark provides any support for incremental extracts
inherently.
But you can maintain a file e.g. extractRange.conf in hdfs , to read from
it the end range and update it with new end range from spark job before it
finishes with the new relevant ranges to be used next time.
On
Hi Ajay
Looking at spark code , i can see you used hive context.
Can you try using sql context instead of hive context there?
Thanks
Deepak
On Mon, Jun 13, 2016 at 10:15 PM, Ajay Chander wrote:
> Hi Mohit,
>
> Thanks for your time. Please find my response below.
>
> Did
Hi Saurabh
You can have hadoop cluster running YARN as scheduler.
Configure spark to run with the same YARN setup.
Then you need R only on 1 node , and connect to the cluster using the
SparkR.
Thanks
Deepak
On Mon, May 30, 2016 at 12:12 PM, Jörn Franke wrote:
>
> Well if
Hi Mayuresh
Instead of s3a , have you tried the https:// uri for the same s3 bucket?
HTH
Deepak
On Tue, May 31, 2016 at 4:41 PM, Mayuresh Kunjir
wrote:
>
>
> On Tue, May 31, 2016 at 5:29 AM, Steve Loughran
> wrote:
>
>> which s3 endpoint?
>>
>>
>
Hello All,
I am looking for a use case where anyone have used spark streaming
integration with LinkedIn.
--
Thanks
Deepak
There is Spark action defined for oozie workflows.
Though I am not sure if it supports only Java SPARK jobs or Scala jobs as
well.
https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html
Thanks
Deepak
On Mon, Mar 7, 2016 at 2:44 PM, Divya Gehlot
wrote:
> Hi,
>
Hi Rafael
If you are using yarn as the engine , you can always use RM UI to see the
application progress.
Thanks
Deepak
On Tue, Apr 5, 2016 at 12:18 PM, Rafael Barreto
wrote:
> Hello,
>
> I have a driver deployed using `spark-submit` in supervised cluster mode.
>
Hi
I am reading a text file with 16 fields.
All the place holders for the values of this text file has been defined in
say 2 different case classes:
Case1 and Case2
How do i map values read from text file , so my function in scala should be
able to return 2 different RDDs , with each each RDD of
Hi
I am reading a text file with 16 fields.
All the place holders for the values of this text file has been defined in
say 2 different case classes:
Case1 and Case2
How do i map values read from text file , so my function in scala should be
able to return 2 different RDDs , with each each RDD of
Since you are registering workers from the same node , do you have enough
cores and RAM(In this case >=9 cores and > = 24 GB ) on this
node(11.14.224.24)?
Thanks
Deepak
On Wed, May 11, 2016 at 9:08 PM, شجاع الرحمن بیگ
wrote:
> Hi All,
>
> I need to set same memory and
(Marketing Platform-BLR) <
rakes...@flipkart.com> wrote:
> Yes, it seems to be the case.
> In this case executors should have continued logging values till 300, but
> they are shutdown as soon as i do "yarn kill .."
>
> On Thu, May 12, 2016 at 12:11 PM Deepak Sharma
Hi Rakesh
Did you tried setting *spark.streaming.stopGracefullyOnShutdown to true *for
your spark configuration instance?
If not try this , and let us know if this helps.
Thanks
Deepak
On Thu, May 12, 2016 at 11:42 AM, Rakesh H (Marketing Platform-BLR) <
rakes...@flipkart.com> wrote:
> Issue i
Hi
I have scala program consisting of spark core and spark streaming APIs
Is there any open source tool that i can use to debug the program for
performance reasons?
My primary interest is to find the block of codes that would be exeuted on
driver and what would go to the executors.
Is there JMX
dead and it shuts down abruptly.
>> Could this issue be related to yarn? I see correct behavior locally. I
>> did "yarn kill " to kill the job.
>>
>>
>> On Thu, May 12, 2016 at 12:28 PM Deepak Sharma <deepakmc...@gmail.com>
>> wrote:
>>
er$: VALUE -> 205
> 16/05/12 10:18:29 INFO processors.StreamJobRunner$: VALUE -> 206
>
>
>
>
>
>
> On Thu, May 12, 2016 at 11:45 AM Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
>> Hi Rakesh
>> Did you tried se
Spark 2.0 is yet to come out for public release.
I am waiting to get hands on it as well.
Please do let me know if i can download source and build spark2.0 from
github.
Thanks
Deepak
On Fri, May 6, 2016 at 9:51 PM, Sunita Arvind wrote:
> Hi All,
>
> We are evaluating a
With Structured Streaming ,Spark would provide apis over spark sql engine.
Its like once you have the structured stream and dataframe created out of
this , you can do ad-hoc querying on the DF , which means you are actually
querying the stram without having to store or transform.
I have not used
Hi Ajay
You can look at wholeTextFiles method of rdd[string,string] and then map
each of rdd to saveAsTextFile .
This will serve the purpose .
I don't think if anything default like distcp exists in spark
Thanks
Deepak
On 10 May 2016 11:27 pm, "Ajay Chander" wrote:
> Hi
then apply
> compression codec on it, save the rdd to another Hadoop cluster?
>
> Thank you,
> Ajay
>
> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
>
>> Hi Ajay
>> You can look at wholeTextFiles method of rdd[string,string] and
Hi Tapan
I would suggest an architecture where you have different storage layer and
data servng layer.
Spark is still best for batch processing of data.
So what i am suggesting here is you can have your data stored as it is in
some hdfs raw layer , run your ELT in spark on this raw data and
Once you download hadoop and format the namenode , you can use start-dfs.sh
to start hdfs.
Then use 'jps' to sss if datanode/namenode services are up and running.
Thanks
Deepak
On Mon, Apr 18, 2016 at 5:18 PM, My List wrote:
> Hi ,
>
> I am a newbie on Spark.I wanted to
binary format or will have to build it?
> 3) Is there a basic tutorial for Hadoop on windows for the basic needs of
> Spark.
>
> Thanks in Advance !
>
> On Mon, Apr 18, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
>> Once you download hadoop
re Galore on Spark.
> Since I am starting afresh, what would you advice?
>
> On Mon, Apr 18, 2016 at 5:45 PM, Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
>> Binary for Spark means ts spark built against hadoop 2.6
>> It will not have any hadoop executables.
&
as trying to
> run big data stuff on windows. Have run in so much of issues that I could
> just throw the laptop with windows out.
>
> Your view - Redhat, Ubuntu or Centos.
> Does Redhat give a one year licence on purchase etc?
>
> Thanks
>
> On Mon, Apr 18, 2016 at
Hi all,
I am looking for an architecture to ingest 10 mils of messages in the micro
batches of seconds.
If anyone has worked on similar kind of architecture , can you please
point me to any documentation around the same like what should be the
architecture , which all components/big data
Hi
If anyone is using or knows about github repo that can help me get started
with image and video processing using spark.
The images/videos will be stored in s3 and i am planning to use s3 with
Spark.
In this case , how will spark achieve distributed processing?
Any code base or references is
ll config to overcome this.
> Tried almost everything i could after searching online.
>
> Any help from the mailing list would be appreciated.
>
> On Thu, Aug 4, 2016 at 7:43 AM, Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
>> I am facing the same issue with spark 1.5
I am facing the same issue with spark 1.5.2
If the file size that's being processed by spark , is of size 10-12 MB , it
throws out of memory .
But if the same file is within 5 MB limit , it runs fine.
I am using spark configuration with 7GB of memory and 3 cores for executors
in the cluster of 8
In spark streaming , you have to decide the duration of micro batches to
run.
Once you get the micro batch , transform it as per your logic and then you
can use saveAsTextFiles on your final RDD to write it to HDFS.
Thanks
Deepak
On 20 Jul 2016 9:49 am, wrote:
I am using DAO in spark application to write the final computation to
Cassandra and it performs well.
What kinds of issues you foresee using DAO for hbase ?
Thanks
Deepak
On 19 Jul 2016 10:04 pm, "Yu Wei" wrote:
> Hi guys,
>
>
> I write spark application and want to store
Hi Phil
I guess for() is executed on the driver while foreach() will execute it in
parallel.
You can try this without collecting the rdd try both .
foreach in this case would print on executors and you would not see
anything on the driver console.
Thanks
Deepak
On Tue, Jul 12, 2016 at 9:28 PM,
You have to distribute the files in some distributed file system like hdfs.
Or else copy the files to all executors local file system and make sure to
mention the file scheme in the URI explicitly.
Thanks
Deepak
On Thu, Jul 7, 2016 at 7:13 PM, Balachandar R.A.
wrote:
Yes .You can do something like this :
.map(x=>mapfunction(x))
Thanks
Deepak
On 9 Jul 2016 9:22 am, "charles li" wrote:
>
> hi, guys, is there a way to dynamic load files within the map function.
>
> i.e.
>
> Can I code as bellow:
>
>
>
>
> thanks a lot.
>
>
>
> --
>
rote:
> I found following links are good as I am using same.
>
> http://spark.apache.org/docs/latest/tuning.html
>
> https://spark-summit.org/2014/testing-spark-best-practices/
>
> Regards,
> Vaquar khan
>
> On 8 Aug 2016 10:11, "Deepak Sharma" <deepakmc.
Hi All,
Can anyone please give any documents that may be there around spark-scala
best practises?
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
Can you please post the code snippet and the error you are getting ?
-Deepak
On 9 Aug 2016 12:18 am, "manish jaiswal" wrote:
> Hi,
>
> I am not able to read data from hive transactional table using sparksql.
> (i don't want read via hive jdbc)
>
>
>
> Please help.
>
Register you dataframes as temp tables and then try the join on the temp
table.
This should resolve your issue.
Thanks
Deepak
On Mon, Aug 8, 2016 at 11:47 PM, Ashic Mahtab wrote:
> Hello,
> We have two parquet inputs of the following form:
>
> a: id:String, Name:String (1.5TB)
I am doing join over 1 dataframe and a empty data frame.
The first dataframe got almost 50k records.
This operation nvere returns back and runs indefinitely.
Is there any solution to get around this?
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
Hi Danellis
For point 1 , spark streaming is something to look at.
For point 2 , you can create DAO from cassandra on each stream
processing.This may be costly operation though , but to do real time
processing of data , you have to live with t.
Point 3 is covered in point 2 above.
Since you are
Hi Devi
Please make sure the jdbc jar is in the spark classpath.
With spark-submit , you can use --jars option to specify the sql server
jdbc jar.
Thanks
Deepak
On Mon, Aug 8, 2016 at 1:14 PM, Devi P.V wrote:
> Hi all,
>
> I am trying to write a spark dataframe into MS-Sql
Yes.I am using spark for ETL and I am sure there are lot of other companies
who are using spark for ETL.
Thanks
Deepak
On 2 Aug 2016 11:40 pm, "Rohit L" wrote:
> Does anyone use Spark for ETL?
>
> On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote:
atic
> write a size properly for what I already set in Alluxio 512MB per block.
>
>
> On Jul 1, 2016, at 11:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote:
>
> Before writing coalesing your rdd to 1 .
> It will create only 1 output file .
> Multiple part file happens
You can use spark testing base's rdd comparators.
Create 2 different dataframes from these 2 hive tables.
Convert them to rdd and use spark-testing-base compareRDD.
Here is an example for rdd comparison:
https://github.com/holdenk/spark-testing-base/wiki/RDDComparisons
On Mon, Jan 30, 2017 at
Hi There,
Are there any examples of using GraphX along with any graph DB?
I am looking to persist the graph in graph based DB and then read it back
in spark , process using graphx.
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
The better way is to read the data directly into spark using spark sql read
jdbc .
Apply the udf's locally .
Then save the data frame back to Oracle using dataframe's write jdbc.
Thanks
Deepak
On Jan 29, 2017 7:15 PM, "Jörn Franke" wrote:
> One alternative could be the
Can you try writing the UDF directly in spark and register it with spark
sql or hive context ?
Or do you want to reuse the existing UDF jar for hive in spark ?
Thanks
Deepak
On Jan 24, 2017 5:29 PM, "Sirisha Cheruvu" wrote:
> Hi Team,
>
> I am trying to keep below code in
Yes.
I will be there before 4 PM .
Whats your contact number ?
Thanks
Deepak
On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu wrote:
> Are we meeting today?!
>
> On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote:
>
>> Hi ,
>>
>> Just thought of keeping my
On the sqlcontext or hivesqlcontext , you can register the function as udf
below:
*hiveSqlContext.udf.register("func_name",func(_:String))*
Thanks
Deepak
On Wed, Jan 18, 2017 at 8:45 AM, Sirisha Cheruvu wrote:
> Hey
>
> Can yu send me the source code of hive java udf which
Did you tried this with spark-shell?
Please try this.
$spark-shell --jars /home/cloudera/Downloads/genudnvl2.jar
On the spark shell:
val hc = new org.apache.spark.sql.hive.HiveContext(sc) ;
hc.sql("create temporary function nexr_nvl2 as '
com.nexr.platform.hive.udf.GenericUDFNVL2'");
>From spark documentation page:
Spark SQL can now run all 99 TPC-DS queries.
On Jan 18, 2017 9:39 AM, "Rishabh Bhardwaj" wrote:
> Hi All,
>
> Does Spark 2.0 Sql support full ANSI SQL query standards?
>
> Thanks,
> Rishabh.
>
a slides say that the default partitions is 2 however its 1
(looking at output of toDebugString).
Appreciate any help.
Thanks
Deepak Sharma
No its not required for UDF.
Its required when you convert from rdd to df.
Thanks
Deepak
On 8 Sep 2016 2:25 pm, "Divya Gehlot" wrote:
> Hi,
>
> Is it necessary to import sqlContext.implicits._ whenever define and
> call UDF in Spark.
>
>
> Thanks,
> Divya
>
>
>
Is it possible to execute any query using SQLContext even if the DB is
secured using roles or tools such as Sentry?
Thanks
Deepak
On Tue, Aug 30, 2016 at 7:52 PM, Rajani, Arpan
wrote:
> Hi All,
>
> In our YARN cluster, we have setup spark 1.6.1 , we plan to give
Data frames are immutable in nature , so i don't think you can directly
assign or change values on the column.
Thanks
Deepak
On Fri, Sep 9, 2016 at 10:59 PM, xingye wrote:
> I have some questions about assign values to a spark dataframe. I want to
> assign values to an
I am not sure about EMR , but seems multi tenancy is not enabled in your
case.
Multi tenancy means all the applications has to be submitted to different
queues.
Thanks
Deepak
On Wed, Sep 14, 2016 at 11:37 AM, Divya Gehlot
wrote:
> Hi,
>
> I am on EMR cluster and My
Use yarn-client mode and you can see the logs n console after you submit.
On Tue, Sep 13, 2016 at 11:47 AM, Divya Gehlot
wrote:
> Hi,
>
> Some how for time being I am unable to view Spark Web UI and Hadoop Web
> UI.
> Looking for other ways ,I can check my job is
;https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss,
What is the message inflow ?
If it's really high , definitely spark will be of great use .
Thanks
Deepak
On Sep 29, 2016 19:24, "Ali Akhtar" wrote:
> I have a somewhat tricky use case, and I'm looking for ideas.
>
> I have 5-6 Kafka producers, reading various APIs, and
h2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own
>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn *
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBx
One of the current best from what I've worked with is
>> Citus.
>>
>> On Thu, Sep 29, 2016 at 10:15 AM, Deepak Sharma <deepakmc...@gmail.com>
>> wrote:
>> > Hi Cody
>> > Spark direct stream is just fine for this use case.
>> > But why post
i Akhtar <ali.rac...@gmail.com> wrote:
> > Is there an advantage to that vs directly consuming from Kafka? Nothing
> is
> > being done to the data except some light ETL and then storing it in
> > Cassandra
> >
> > On Thu, Sep 29, 2016 at 7:58 PM, Deepak Sharma &l
Hi Anupama
To me it looks like issue with the SPN with which you are trying to connect
to hive2 , i.e. hive@hostname.
Are you able to connect to hive from spark-shell?
Try getting the tkt using any other user keytab but not hadoop services
keytab and then try running the spark submit.
Thanks
Enrich the RDDs first with more information and then map it to some case
class , if you are using scala.
You can then use play api's
(play.api.libs.json.Writes/play.api.libs.json.Json) classes to convert the
mapped case class to json.
Thanks
Deepak
On Tue, Sep 20, 2016 at 6:42 PM, sujeet jog
Hi Subhajit
Try this in your join:
*val* *df** =
**sales_demand**.**join**(**product_master**,**sales_demand**.$"INVENTORY_ITEM_ID"
=**== **product_master**.$"INVENTORY_ITEM_ID",**"inner"**)*
On Tue, Aug 23, 2016 at 2:30 AM, Subhajit Purkayastha
wrote:
> *All,*
>
>
>
> *I
On Tue, Aug 23, 2016 at 10:32 AM, Deepak Sharma <deepakmc...@gmail.com>
wrote:
> *val* *df** =
> **sales_demand**.**join**(**product_master**,**sales_demand**.$"INVENTORY_ITEM_ID"
> =**== **product_master**.$"INVENTORY_ITEM_ID",**"inner"**)*
Ignore
Hi Rohit
You can use accumulators and increase it on every record processing.
At last you can get the value of accumulator on driver , which will give
you the count.
HTH
Deepak
On Nov 5, 2016 20:09, "Rohit Verma" wrote:
> I am using spark to read from database and
Can you try caching the individual dataframes and then union them?
It may save you time.
Thanks
Deepak
On Wed, Nov 16, 2016 at 12:35 PM, Devi P.V wrote:
> Hi all,
>
> I have 4 data frames with three columns,
>
> client_id,product_id,interest
>
> I want to combine these 4
amage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 November 2
This is waste of money I guess.
On Nov 11, 2016 22:41, "Mich Talebzadeh" wrote:
> starts at $4,000 per node per year all inclusive.
>
> With discount it can be halved but we are talking a node itself so if you
> have 5 nodes in primary and 5 nodes in DR we are talking
There are 8 worker nodes in the cluster .
Thanks
Deepak
On Dec 18, 2016 2:15 AM, "Holden Karau" <hol...@pigscanfly.ca> wrote:
> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
&
Hi All,
I am iterating over data frame's paritions using df.foreachPartition .
Upon each iteration of row , i am initializing DAO to insert the row into
cassandra.
Each of these iteration takes almost 1 and half minute to finish.
In my workflow , this is part of an action and 100 partitions are
On Sun, Dec 18, 2016 at 2:26 AM, vaquar khan wrote:
> select * from indexInfo;
>
Hi Vaquar
I do not see CF with the name indexInfo in any of the cassandra databases.
Thank
Deepak
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
2016 at 1:49 PM, Deepak Sharma <deepakmc...@gmail.com> wrote:
> This is the correct way to do it.The timestamp that you mentioned was not
> correct:
>
> scala> val ts1 = from_unixtime($"ts"/1000, "-MM-dd")
> ts1: org.apache.spark.sql.Column =
01|
|3bc61951-0f49-43b...|1477983725292|2016-11-01|
|688acc61-753f-4a3...|1479899459947|2016-11-23|
|5ff1eb6c-14ec-471...|1479901374026|2016-11-23|
++-+--+
Thanks
Deepak
On Mon, Dec 5, 2016 at 1:46 PM, Deepak Sharma <deepakmc...@gmail.com> wrote:
>
This is how you can do it in scala:
scala> val ts1 = from_unixtime($"ts", "-MM-dd")
ts1: org.apache.spark.sql.Column = fromunixtime(ts,-MM-dd)
scala> val finaldf = df.withColumn("ts1",ts1)
finaldf: org.apache.spark.sql.DataFrame = [client_id: string, ts: string,
ts1: string]
scala>
In Spark > 2.0 , spark session was introduced that you can use to query
hive as well.
Just make sure you create spark session with enableHiveSupport() option.
Thanks
Deepak
On Thu, Dec 1, 2016 at 12:27 PM, shyla deshpande
wrote:
> I am Spark 2.0.2 , using DStreams
You can read the source in a data frame.
Then iterate over all rows with map and use something like below:
df.map(x=>x(0).toString().toDouble)
Thanks
Deepak
On Tue, Dec 20, 2016 at 3:05 PM, big data wrote:
> our source data are string-based data, like this:
> col1
any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 December 2016 at 09:52, Deepak S
sclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 December 2016 at 10:30, Deepak Sharma <deepakmc...@gmail.com> wrote:
>
>> It works for me with spark 1.6 (--jars)
>> Please tr
linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all respon
Hi Mich
You can copy the jar to shared location and use --jars command line
argument of spark-submit.
Who so ever needs access to this jar , can refer to the shared path and
access it using --jars argument.
Thanks
Deepak
On Tue, Dec 27, 2016 at 3:03 PM, Mich Talebzadeh
Hi All,
I have registered temp tables using hive context and sql context both.
Now when i try to join these 2 temp tables , 1 of the tables complain about
not being found.
Is there any setting or option so the tables in these 2 different contexts
are visible to each other?
--
Thanks
Deepak
If the df is empty , the .take would return
java.util.NoSuchElementException.
This can be done as below:
df.rdd.isEmpty
On Tue, Mar 7, 2017 at 9:33 AM, wrote:
> Dataframe.take(1) is faster.
>
>
>
> *From:* ashaita...@nz.imshealth.com
On Tue, Mar 7, 2017 at 2:37 PM, Nick Pentreath
wrote:
> df.take(1).isEmpty should work
My bad.
It will return empty array:
emptydf.take(1)
res0: Array[org.apache.spark.sql.Row] = Array()
and applying isEmpty would return boolean
emptydf.take(1).isEmpty
res2:
I am tying to connect to AWS managed ES service using Spark ES Connector ,
but am not able to.
I am passing es.nodes and es.port along with es.nodes.wan.only set to true.
But it fails with below error:
34 ERROR NetworkClient: Node [x.x.x.x:443] failed (The server x.x.x.x
failed to respond); no
This can be mapped as below:
dataset.map(x=>((x(0),x(1),x(2)),x)
This works with Dataframe of rows but i haven't tried with dataset
Thanks
Deepak
On Mon, Aug 7, 2017 at 8:21 AM, Jone Zhang wrote:
> val schema = StructType(
> Seq(
> StructField("app",
I am not sure about java but in scala it would be something like
df.rdd.map{ x => MyClass(x.getString(0),.)}
HTH
--Deepak
On Dec 19, 2017 09:25, "Sunitha Chennareddy"
wrote:
Hi All,
I am new to Spark, I want to convert DataFrame to List with out
using
I am looking for similar solution more aligned to data scientist group.
The concern i have is about supporting complex aggregations at runtime .
Thanks
Deepak
On Nov 12, 2017 12:51, "ashish rawat" wrote:
> Hello Everyone,
>
> I was trying to understand if anyone here has
os for your
>> end users; but it sounds like you’ll be using it for exploratory analysis.
>> Spark is great for this ☺
>>
>>
>>
>> -Pat
>>
>>
>>
>>
>>
>> *From: *Vadim Semenov <vadim.seme...@datadoghq.com>
>> *Date: *Su
1 - 100 of 146 matches
Mail list logo