date:20170215

Re: is dataframe thread safe?

2017-02-15 Thread ??????????

updating dataframe returns NEW dataframe like RDD please? ---Original--- From: "vincent gromakowski" Date: 2017/2/14 01:15:35 To: "Reynold Xin"; Cc: "user";"Mendelson, Assaf"; Subject: Re: is dataframe thread safe? How about having a thread that update and cache a dataframe in-memory next to

Re: is dataframe thread safe?

2017-02-15 Thread vincent gromakowski

I would like to have your opinion about an idea I had... I am thinking of answering the issue of interactive query on small/medium dataset (max 500 GB or 1 TB) with a solution based on the thriftserver and spark cache management. Currently the problem of caching the dataset in Spark is that you ca

Re: using spark-xml_2.10 to extract data from XML file

2017-02-15 Thread Carlo . Allocca

Hi Hyukjin, Thank you very much for this. Sure I am going to do it today based on data + java code. Many Thanks for the support. Best Regards, Carlo On 15 Feb 2017, at 00:22, Hyukjin Kwon mailto:gurwls...@gmail.com>> wrote: Hi Carlo, There was a bug in lower versions when accessing to nest

Re: How to get a spark sql statement implement duration ?

2017-02-15 Thread ??????????

you can find the duration time by wen ui,such as http://xxx:8080 .It depends on your setting. anout the shell, i do not know how to check the time ---Original--- From: "Jacek Laskowski" Date: 2017/2/8 04:14:58 To: "Mars Xu"; Cc: "user"; Subject: Re: How to get a spark sql statement implement du

Spark executor memory and jvm heap memory usage metric

2017-02-15 Thread satishl

We have been measuring jvm heap memory usage in our spark app, by taking periodic sampling of jvm heap memory usage and saving it in our metrics db. we do this by spawning a thread in the spark app and measuring the jvm heap memory usage every 1 min. Is it a fair assumption to conclude that if the

What is the practical use of "Peak Execution Memory" in Spark App Resource tuning

2017-02-15 Thread satishl

Question is in the title. Can the metric "Peak Execution memory" be used for spark app resource tuning, if yes how? if no, what purpose does it serve during debugging Apps. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-practical-use-of-Peak-Exe

extracting eventlogs saved snappy format.

2017-02-15 Thread satishl

what is the right way to unzip an Spark app eventlog saved in snappy format (.snappy) Are there any libraries which we can use to do this programmatically? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/extracting-eventlogs-saved-snappy-format-tp28393.html

Re: extracting eventlogs saved snappy format.

2017-02-15 Thread Jörn Franke

What do you want to do with the event log ? The Hadoop command line can show compressed files (hadoop fs -text). Alternatively there are tools depending on your os ... you can also write a small job to do this and run it on the cluster. > On 15 Feb 2017, at 10:55, satishl wrote: > > what is th

notebook connecting Spark On Yarn

2017-02-15 Thread Sachin Aggarwal

Hi, I am trying to create multiple notebooks connecting to spark on yarn. After starting few jobs my cluster went out of containers. All new notebook request are in busy state as Jupyter kernel gateway is not getting any containers for master to be started. Some job are not leaving the containers

Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-15 Thread Ahmed Kamal Abdelfatah

Hi folks, How can I force spark sql to recursively get data stored in parquet format from subdirectories ? In Hive, I could achieve this by setting few Hive configs. set hive.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true; set hive.supports.subdirectories=true; set mapre

RE: How to specify default value for StructField?

2017-02-15 Thread Begar, Veena

Thanks Yong. I know about merging the schema option. Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also. Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it using

Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2017-02-15 Thread Dibyendu Bhattacharya

Hi , Released latest version of Receiver based Kafka Consumer for Spark Streaming . Available at Spark Packages : https://spark-packages.org/package/dibbhatt/ kafka-spark-consumer Also at github : https://github.com/dibbhatt/kafka-spark-consumer Some key features - Tuned for better perform

Re: How to specify default value for StructField?

2017-02-15 Thread Yong Zhang

If it works under hive, do you try just create the DF from Hive table directly in Spark? That should work, right? Yong From: Begar, Veena Sent: Wednesday, February 15, 2017 10:16 AM To: Yong Zhang; smartzjp; user@spark.apache.org Subject: RE: How to specify de

Regarding transformation with dataframe

2017-02-15 Thread Gaurav Agarwal

Hello I have loaded 3 dataframes with 3 different Static tables. Now i got the csv file and with the help of Spark i loaded the csv into dataframe and named it as temporary table as "Employee". Now i need to enrich the columns in the Employee DF and query any of 3 static table respectively with so

Re: notebook connecting Spark On Yarn

2017-02-15 Thread Jon Gregg

Could you just make Hadoop's resource manager (port 8088) available to your users, and they can check available containers that way if they see the launch is stalling? Another option is to reduce the default # of executors and memory per executor in the launch script to some small fraction of your

Enrichment with static tables

2017-02-15 Thread Gaurav Agarwal

Hello We want to enrich our spark RDD loaded with multiple Columns and multiple Rows . This need to be enriched with 3 different tables that i loaded 3 different spark dataframe . Can we write some logic in spark so i can enrich my spark RDD with different stattic tables. Thanks

Re: Enrichment with static tables

2017-02-15 Thread Sam Elamin

You can do a join or a union to combine all the dataframes to one fat dataframe or do a select on the columns you want to produce your transformed dataframe Not sure if I understand the question though, If the goal is just an end state transformed dataframe that can easily be done Regards Sam

[Spark Streaming WAL] custom java streaming receiver and the WAL

2017-02-15 Thread Charles O. Bajomo

Hello all, I am having some problems with my custom java based receiver. I am running Spark 1.5.0 and I used the template on the spark website (http://spark.apache.org/docs/1.0.0/streaming-custom-receivers.html). Basically my receiver listens to a JMS queue (Solace) and then based on the size

Remove .HiveStaging files

2017-02-15 Thread KhajaAsmath Mohammed

Hi, I am using spark temporary tables to write data back to hive. I have seen weird behavior of .hive-staging files after job completion. does anyone know how to delete them or dont get created while writing data into hive. Thanks, Asmath

Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri

Hello All, What would be the best approches to monitor Spark Performance, is there any tools for Spark Job Performance monitoring ? Thanks.

Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Georg Heiler

I know of the following tools https://sites.google.com/site/sparkbigdebug/home https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark https://github.com/SparkMonitor/varOne https://github.com/groupon/sparklint Chetan Khatri schrieb am Do

Re: physical memory usage keep increasing for spark app on Yarn

2017-02-15 Thread Yang Cao

Hi Pavel! Sorry for late. I just do some investigation in these days with my colleague. Here is my thought: from spark 1.2, we use Netty with off-heap memory to reduce GC during shuffle and cache block transfer. In my case, if I try to increase the memory overhead enough. I will get the Max dir

Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri

Thank you Georg On Thu, Feb 16, 2017 at 12:30 PM, Georg Heiler wrote: > I know of the following tools > https://sites.google.com/site/sparkbigdebug/home https:// > engineering.linkedin.com/blog/2016/04/dr-elephant-open- > source-self-serve-performance-tuning-hadoop-spark https:// > github.com/Sp

Re: is dataframe thread safe?

Re: is dataframe thread safe?

Re: using spark-xml_2.10 to extract data from XML file

Re: How to get a spark sql statement implement duration ?

Spark executor memory and jvm heap memory usage metric

What is the practical use of "Peak Execution Memory" in Spark App Resource tuning

extracting eventlogs saved snappy format.

Re: extracting eventlogs saved snappy format.

notebook connecting Spark On Yarn

Query data in subdirectories in Hive Partitions using Spark SQL

RE: How to specify default value for StructField?

Latest Release of Receiver based Kafka Consumer for Spark Streaming.

Re: How to specify default value for StructField?

Regarding transformation with dataframe

Re: notebook connecting Spark On Yarn

Enrichment with static tables

Re: Enrichment with static tables

[Spark Streaming WAL] custom java streaming receiver and the WAL

Remove .HiveStaging files

Spark Job Performance monitoring approaches

Re: Spark Job Performance monitoring approaches

Re: physical memory usage keep increasing for spark app on Yarn

Re: Spark Job Performance monitoring approaches

23 matches

Site Navigation

Mail list logo

Footer information