Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Sid
Where can I find information on the size of the datasets supported by AWS Glue? I didn't see it on the documentation Also, if I want to process TBs of data for eg 1TB what should be the ideal EMR cluster configuration? Could you please guide me on this? Thanks, Sid. On Thu, 23 Jun 2022, 23:44

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Gourav Sengupta
Please use EMR, Glue is not made for heavy processing jobs. On Thu, Jun 23, 2022 at 6:36 AM Sid wrote: > Hi Team, > > Could anyone help me in the below problem: > > > https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-processing-1tb-data > >

Need help with the configuration for AWS glue jobs

2022-06-22 Thread Sid
Hi Team, Could anyone help me in the below problem: https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-processing-1tb-data Thanks, Sid

Need help on migrating Spark on Hortonworks to Kubernetes Cluster

2022-05-08 Thread Chetan Khatri
Hi Everyone, I need help on my Airflow DAG which has Spark Submit and Now I have Kubernetes Cluster instead Hortonworks Linux Distributed Spark Cluster.My existing Spark-Submit is through BashOperator as below: calculation1 = '/usr/hdp/2.6.5.0-292/spark2/bin/spark-submit --conf

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-06 Thread Mich Talebzadeh
Ok Nick let us have a look at this. Your raw size is variable as per json. One row may be x bytes and the other x*y bytes. Sounds like you run this batch through some cron or airflow and you carry on from where checkpointLocation points to the last processed records. You end up with executors

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-05 Thread Nick Grigoriev
Hi Mich, Thanks for quick response. 1. No, I use Batch query with fixed start and end offset. 2. Yes, My message in Kafkas(json format) can have really big difference in size from 1kb to 9kb. And even when I transform JSON to flat Spark SQL row it still can has different size. 3. I have two

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-04 Thread Mich Talebzadeh
Hi Nick, I looked at both this thread and your SO question. Trying to understand 1. You are reading through Kafka via Spark structured streaming. 2. Your messages from Kafka are not uniform, meaning you may get variable record size in each window. 3. How are you processing these

Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-04 Thread Nick Grigoriev
I have ask this question on stack overflow, but it look to complex for Q/A resource. https://stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected-and-even-make So I want ask for help here. I use global sort on my spark DF, and when I enable AQE

Need help to create database and integration woth Spark App in local machine

2021-06-12 Thread Himanshu Soni
Hi Team, Could you please help with below : 1. Want to create a database (Oracle) with some tables in local machine 2. Integrate the database tables so i can query them from Spark App in local machine Thanks & Regards- Himanshu Soni Mobile: +91 8411000279

Need help on Calling Pyspark code using Wheel

2020-10-23 Thread Sachit Murarka
Hi Users, I have created a wheel file using Poetry. I tried running the following commands to run spark job using wheel , but it is not working. Can anyone please let me know about the invocation step for the wheel file? spark-submit --py-files /path/to/wheel spark-submit --files /path/to/wheel

Need help with Application Detail UI link

2019-12-05 Thread Sunil Patil
Hello, I am running Standalone spark server in EC2. My cluster has 1 master and 16 worker nodes. I have a jenkins server that calls spark-submit command like this /mnt/services/spark/bin/spark-submit --master spark://172.22.6.181:6066 --deploy-mode cluster --conf spark.driver.maxResultSize=1g

Re: Need help regarding logging / log4j.properties

2019-10-31 Thread Roland Johann
Hi Debu, you need to define spark config properties before the jar file path at spark-submit. Everything after the jar path will be passed as arguments to your application. Best Regards Debabrata Ghosh schrieb am Do. 31. Okt. 2019 um 03:26: > Greetings All ! > > I needed some help in

Need help regarding logging / log4j.properties

2019-10-30 Thread Debabrata Ghosh
Greetings All ! I needed some help in obtaining the application logs but I am really confused where it's currently located. Please allow me to explain my problem: 1. I am running the Spark application (written in Java) in a Hortonworks Data Platform Hadoop cluster 2. My spark-submit command is

Re: Need help with SparkSQL Query

2018-12-17 Thread Ramandeep Singh Nanda
You can use analytical functions in spark sql. Something like select * from (select id, row_number() over (partition by id order by timestamp ) as rn from root) where rn=1 On Mon, Dec 17, 2018 at 4:03 PM Nikhil Goyal wrote: > Hi guys, > > I have a dataframe of type Record (id: Long, timestamp:

Re: Need help with SparkSQL Query

2018-12-17 Thread Patrick McCarthy
Untested, but something like the below should work: from pyspark.sql import functions as F from pyspark.sql import window as W (record .withColumn('ts_rank', F.dense_rank().over(W.Window.orderBy('timestamp').partitionBy("id")) .filter(F.col('ts_rank')==1) .drop('ts_rank') ) On Mon, Dec 17,

Need help with SparkSQL Query

2018-12-17 Thread Nikhil Goyal
Hi guys, I have a dataframe of type Record (id: Long, timestamp: Long, isValid: Boolean, other metrics) Schema looks like this: root |-- id: long (nullable = true) |-- timestamp: long (nullable = true) |-- isValid: boolean (nullable = true) . I need to find the earliest valid record

Re: Need help with String Concat Operation

2017-10-18 Thread 高佳翔
Hi Debu, First, Instead of using ‘+’, you can use ‘concat’ to concatenate string columns. And you should enclose “0” with "lit()" to make it a column. Second, 1440 become null because you didn’t tell spark what to do if the when clause is failed. So it simply set the value to null. To fix this,

Need help with String Concat Operation

2017-10-18 Thread Debabrata Ghosh
Hi, I am having a dataframe column (name of the column is CTOFF) and I intend to prefix with '0' in case the length of the column is 3. Unfortunately, I am unable to acheive my goal and wonder whether you can help me here. Command which I am executing: ctoff_dedup_prep_temp =

Re: Need help

2017-10-10 Thread Ilya Karpov
Suggest you reading «Hadoop Application Architectures» (orelly) by Mark Grover, Ted Malaska and others. There you can find some answers for your questions. > 10 окт. 2017 г., в 9:00, Mahender Sarangam > написал(а): > > Hi, > > I'm new to spark and big data, we

Need help

2017-10-10 Thread Mahender Sarangam
Hi, I'm new to spark and big data, we are doing some poc and building our warehouse application using Spark. Can any one share with me guidance like Naming Convention for HDFS Name,Table Names, UDF and DB Name. Any sample architecture diagram. -Mahens

Re: Need help for RDD/DF transformation.

2017-03-30 Thread Yong Zhang
you can just pick the first element out from the Array "keys" of DF2, to join. Otherwise, I don't see any way to avoid a cartesian join. Yong From: Mungeol Heo <mungeol@gmail.com> Sent: Thursday, March 30, 2017 3:05 AM To: ayan guha Cc: Yong Zhang

Re: Need help for RDD/DF transformation.

2017-03-30 Thread Mungeol Heo
7 at 9:03 PM, Yong Zhang <java8...@hotmail.com> wrote: >> > What is the desired result for >> > >> > >> > RDD/DF 1 >> > >> > 1, a >> > 3, c >> > 5, b >> > >> > RDD/DF 2 >> > >> > [1, 2, 3]

Re: Need help for RDD/DF transformation.

2017-03-30 Thread ayan guha
e desired result for > > > > > > RDD/DF 1 > > > > 1, a > > 3, c > > 5, b > > > > RDD/DF 2 > > > > [1, 2, 3] > > [4, 5] > > > > > > Yong > > > > > > From: Mungeol He

Re: Need help for RDD/DF transformation.

2017-03-29 Thread Mungeol Heo
com> wrote: > What is the desired result for > > > RDD/DF 1 > > 1, a > 3, c > 5, b > > RDD/DF 2 > > [1, 2, 3] > [4, 5] > > > Yong > > > From: Mungeol Heo <mungeol....@gmail.com> > Sent: Wednes

Re: Need help for RDD/DF transformation.

2017-03-29 Thread Yong Zhang
What is the desired result for RDD/DF 1 1, a 3, c 5, b RDD/DF 2 [1, 2, 3] [4, 5] Yong From: Mungeol Heo <mungeol@gmail.com> Sent: Wednesday, March 29, 2017 5:37 AM To: user@spark.apache.org Subject: Need help for RDD/DF transformation. Hello, S

Need help for RDD/DF transformation.

2017-03-29 Thread Mungeol Heo
Hello, Suppose, I have two RDD or data frame like addressed below. RDD/DF 1 1, a 3, a 5, b RDD/DF 2 [1, 2, 3] [4, 5] I need to create a new RDD/DF like below from RDD/DF 1 and 2. 1, a 2, a 3, a 4, b 5, b Is there an efficient way to do this? Any help will be great. Thank you.

Re: need help to have a Java version of this scala script

2016-12-17 Thread Richard Xin
thanks for pointing to the right direction, I have figured out the way. On Saturday, December 17, 2016 5:23 PM, Igor Berman wrote: do you mind to show what you have in java?in general $"bla" is col("bla") as soon as you import appropriate functionimport static

Re: need help to have a Java version of this scala script

2016-12-17 Thread Igor Berman
do you mind to show what you have in java? in general $"bla" is col("bla") as soon as you import appropriate function import static org.apache.spark.sql.functions.callUDF; import static org.apache.spark.sql.functions.col; udf should be callUDF e.g. ds.withColumn("localMonth",

need help to have a Java version of this scala script

2016-12-16 Thread Richard Xin
what I am trying to do:I need to add column (could be complicated transformation based on value of a column) to a give dataframe. scala script:val hContext = new HiveContext(sc) import hContext.implicits._ val df = hContext.sql("select x,y,cluster_no from test.dc") val len = udf((str: String) =>

Re: Need help Creating a rule using the Streaming API

2016-10-27 Thread patrickhuang
Hi, Maybe you can try like this? val transformed= events.map(event => ((event.user, event.ip), 1).reduceByKey(_ +_) val alarm= transformed.filter(transformed._2 >= 10) Patrick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-Creating-a-rule

Re: Need help with SVM

2016-10-26 Thread Robin East
; From: Aditya Vyas <adityavya...@gmail.com <mailto:adityavya...@gmail.com>> > Date: Tue, Oct 25, 2016 at 8:16 PM > Subject: Re: Need help with SVM > To: Aseem Bansal <asmbans...@gmail.com <mailto:asmbans...@gmail.com>> > > >

Fwd: Need help with SVM

2016-10-26 Thread Aseem Bansal
He replied to me. Forwarding to the mailing list. -- Forwarded message -- From: Aditya Vyas <adityavya...@gmail.com> Date: Tue, Oct 25, 2016 at 8:16 PM Subject: Re: Need help with SVM To: Aseem Bansal <asmbans...@gmail.com> Hello, Here is the public gist:https://gis

Re: Need help with SVM

2016-10-26 Thread Robin East
park-user-list.1001560.n3.nabble.com/file/n27955/Screen_Shot_2016-10-25_at_2.png>> > > > Can someone please help? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Need

Re: Need help with SVM

2016-10-25 Thread Aseem Bansal
55/Screen_Shot_2016-10-25_at_2.png> > > > Can someone please help? > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >

Need help with SVM

2016-10-24 Thread aditya1702
t_2.png> Can someone please help? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-SVM-tp27955.html Sent from the Apache Spark User List mailing list archive at Nabb

Need help :- org.apache.spark.SparkException :- No such file or directory

2016-09-29 Thread Madabhattula Rajesh Kumar
Hi Team, I getting below exception in spark jobs. Please let me know how to fix this issue. *Below is my cluster configuration :- * I am using SparkJobServer to trigger the jobs. Below is my configuration in SparkJobServer. - num-cpu-cores = 4 - memory-per-node = 4G I have a 4 workers

Re: Apache Spark JavaRDD pipe() need help

2016-09-23 Thread शशिकांत कुलकर्णी
Thank you Jakob. I will try as suggested. Regards, Shashi On Fri, Sep 23, 2016 at 12:14 AM, Jakob Odersky wrote: > Hi Shashikant, > > I think you are trying to do too much at once in your helper class. > Spark's RDD API is functional, it is meant to be used by writing many >

Re: Apache Spark JavaRDD pipe() need help

2016-09-22 Thread Jakob Odersky
Hi Shashikant, I think you are trying to do too much at once in your helper class. Spark's RDD API is functional, it is meant to be used by writing many little transformations that will be distributed across a cluster. Appart from that, `rdd.pipe` seems like a good approach. Here is the relevant

Re: Apache Spark JavaRDD pipe() need help

2016-09-22 Thread शशिकांत कुलकर्णी
Hello Jakob, Thanks for replying. Here is a short example of what I am trying. Taking an example of Product column family in Cassandra just for explaining my requirement In Driver.java { JavaRDD productsRdd = Get Products from Cassandra;

Re: Apache Spark JavaRDD pipe() need help

2016-09-21 Thread Jakob Odersky
Can you provide more details? It's unclear what you're asking On Wed, Sep 21, 2016 at 10:14 AM, shashikant.kulka...@gmail.com wrote: > Hi All, > > I am trying to use the JavaRDD.pipe() API. > > I have one object with me from the JavaRDD

Apache Spark JavaRDD pipe() need help

2016-09-21 Thread shashikant.kulka...@gmail.com
w if you need more inputs from me. Thanks in advance. Shashi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-JavaRDD-pipe-need-help-tp27772.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark word count program , need help on integration

2016-09-12 Thread gobi s
Hi, I am new to spark. I want to develop a word count app and deploy it in local mode. from outside I want to trigger the program and get the word count output and show it to the UI. I need help on integration of Spark and outside. i) How to trigger the Spark app from the j2ee app

Need help with spark GraphiteSink

2016-06-28 Thread Vijay Vangapandu
Hi, I need help resolving issue with spark GraphiteSink. I am trying to use graphite sink, but i have no luck. Here are the details. spark version is 1.4 and i am passing below 2 arguments to spark-submit job in yarn cluster mode. --files=/data/svc/metrics/metrics.properties --conf

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Chandeep Singh
You could also fire up a VNC session and access all internal pages from there. > On Feb 15, 2016, at 9:19 AM, Divya Gehlot wrote: > > Hi Sabarish, > Thanks alot for your help. > I am able to view the logs now > > Thank you very much . > > Cheers, > Divya > > > On

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi Sabarish, Thanks alot for your help. I am able to view the logs now Thank you very much . Cheers, Divya On 15 February 2016 at 16:51, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: > You can setup SSH tunneling. > > >

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Akhil Das
According to the documentation , the hostname that you are seeing for those properties are inherited from *yarn.nodemanager.hostname *if your requirement is just to see the logs, then you can ssh-tunnel to the

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Sabarish Sasidharan
You can setup SSH tunneling. http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html Regards Sab On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot wrote: > Hi, > I have hadoop cluster set up in EC2. > I am unable to view application logs in

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Akhil Das
You can set *yarn.nodemanager.webapp.address* in the yarn-site.xml/yarn-default.xml file to change it I guess. Thanks Best Regards On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot wrote: > Hi, > I have hadoop cluster set up in EC2. > I am unable to view application logs

Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi, I have hadoop cluster set up in EC2. I am unable to view application logs in Web UI as its taking internal IP Like below : http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042 How can I change this to external one or

Re: Need help in spark-Scala program

2016-02-01 Thread Vinti Maheshwari
Hi, Sorry, please ignore my message, It was sent by mistake. I am still drafting. Regards, Vinti On Mon, Feb 1, 2016 at 2:25 PM, Vinti Maheshwari wrote: > Hi All, > > I recently started learning Spark. I need to use spark-streaming. > > 1) Input, need to read from

Need help in spark-Scala program

2016-02-01 Thread Vinti Maheshwari
Hi All, I recently started learning Spark. I need to use spark-streaming. 1) Input, need to read from MongoDB db.event_gcovs.find({executions:"56791a746e928d7b176d03c0", valid:1, infofile:{$exists:1}, geo:"sunnyvale"}, {infofile:1}).count() > Number of Info files: 24441 /* 0 */ { "_id" :

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
The analog to PairRDD is a GroupedDataset (created by calling groupBy), which offers similar functionality, but doesn't require you to construct new object that are in the form of key/value pairs. It doesn't matter if they are complex objects, as long as you can create an encoder for them

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Steve Lewis
Thanks - this helps a lot except for the issue of looking at schools in neighboring regions On Wed, Jan 20, 2016 at 10:43 AM, Michael Armbrust wrote: > The analog to PairRDD is a GroupedDataset (created by calling groupBy), > which offers similar functionality, but

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
Yeah, that tough. Perhaps you could do something like a flatMap and emit multiple virtual copies of each student for each region that is neighboring their actual region. On Wed, Jan 20, 2016 at 10:50 AM, Steve Lewis wrote: > Thanks - this helps a lot except for the issue

I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Steve Lewis
We have been working a large search problem which we have been solving in the following ways. We have two sets of objects, say children and schools. The object is to find the closest school to each child. There is a distance measure but it is relatively expensive and would be very costly to apply

Need Help in Spark Hive Data Processing

2016-01-06 Thread Balaraju.Kagidala Kagidala
Hi , I am new user to spark. I am trying to use Spark to process huge Hive data using Spark DataFrames. I have 5 node Spark cluster each with 30 GB memory. i am want to process hive table with 450GB data using DataFrames. To fetch single row from Hive table its taking 36 mins. Pls suggest me

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jeff Zhang
It depends on how you fetch the single row. Does your query complex ? On Thu, Jan 7, 2016 at 12:47 PM, Balaraju.Kagidala Kagidala < balaraju.kagid...@gmail.com> wrote: > Hi , > > I am new user to spark. I am trying to use Spark to process huge Hive > data using Spark DataFrames. > > > I have 5

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jörn Franke
You need the table in an efficient format, such as Orc or parquet. Have the table sorted appropriately (hint: most discriminating column in the where clause). Do not use SAN or virtualization for the slave nodes. Can you please post your query. I always recommend to avoid single updates where

Re: Need Help Diagnosing/operating/tuning

2015-11-23 Thread Igor Berman
you should check why executor is killed. as soon as it's killed you can get all kind of strange exceptions... either give your executors more memory(4G is rather small for spark ) or try to decrease your input or maybe split it into more partitions in input format 23G in lzo might expand to x?

Re: Need Help Diagnosing/operating/tuning

2015-11-22 Thread Jeremy Davis
It seems like the problem is related to —executor-cores. Is there possibly some sort of race condition when using multiple cores per executor? On Nov 22, 2015, at 12:38 PM, Jeremy Davis > wrote: Hello, I’m at a loss trying to diagnose why

Need Help Diagnosing/operating/tuning

2015-11-22 Thread Jeremy Davis
Hello, I’m at a loss trying to diagnose why my spark job is failing. (works fine on small data) It is failing during the repartition, or on the subsequent steps.. which then seem to fail and fall back to repartitioning.. I’ve tried adjusting every parameter I can find, but have had no success.

Re: Drools and Spark Integration - Need Help

2015-09-07 Thread Akhil Das
How are you integrating it with spark? Thanks Best Regards On Fri, Sep 4, 2015 at 12:11 PM, Shiva moorthy wrote: > Hi Team, > > I am able to integrate Drools with Apache spark but after integration my > application runs slower. > Could you please give ideas about how

Drools and Spark Integration - Need Help

2015-09-04 Thread Shiva moorthy
Hi Team, I am able to integrate Drools with Apache spark but after integration my application runs slower. Could you please give ideas about how Drools can be efficiently integrated with Spark? Appreciate your help. Thanks and Regards, Shiva

Re: Re: Need help in setting up spark cluster

2015-07-23 Thread fightf...@163.com
suggest you firstly to deploy a spark standalone cluster to run some integration tests, and also you can consider running spark on yarn for the later development use cases. Best, Sun. fightf...@163.com From: Jeetendra Gangele Date: 2015-07-23 13:39 To: user Subject: Re: Need help in setting

Re: Re: Need help in setting up spark cluster

2015-07-23 Thread Jeetendra Gangele
*To:* user user@spark.apache.org *Subject:* Re: Need help in setting up spark cluster Can anybody help here? On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote: Hi All, I am trying to capture the user activities for real estate portal. I am using RabbitMS and Spark

Need help in SparkSQL

2015-07-22 Thread Jeetendra Gangele
HI All, I have data in MongoDb(few TBs) which I want to migrate to HDFS to do complex queries analysis on this data.Queries like AND queries involved multiple fields So my question in which which format I should store the data in HDFS so that processing will be fast for such kind of queries?

Re: Need help in SparkSQL

2015-07-22 Thread Jörn Franke
Can you provide an example of an and query ? If you do just look-up you should try Hbase/ phoenix, otherwise you can try orc with storage index and/or compression, but this depends on how your queries look like Le mer. 22 juil. 2015 à 14:48, Jeetendra Gangele gangele...@gmail.com a écrit : HI

Re: Need help in SparkSQL

2015-07-22 Thread Jörn Franke
I do not think you can put all your queries into the row key without duplicating the data for each query. However, this would be more last resort. Have you checked out phoenix for Hbase? This might suit your needs. It makes it much simpler, because it provided sql on top of Hbase. Nevertheless,

Re: Need help in setting up spark cluster

2015-07-22 Thread Jeetendra Gangele
Can anybody help here? On 22 July 2015 at 10:38, Jeetendra Gangele gangele...@gmail.com wrote: Hi All, I am trying to capture the user activities for real estate portal. I am using RabbitMS and Spark streaming combination where all the Events I am pushing to RabbitMQ and then 1 secs micro

Re: Need help in SparkSQL

2015-07-22 Thread Jeetendra Gangele
Query will be something like that 1. how many users visited 1 BHK flat in last 1 hour in given particular area 2. how many visitor for flats in give area 3. list all user who bought given property in last 30 days Further it may go too complex involving multiple parameters in my query. The

RE: Need help in SparkSQL

2015-07-22 Thread Mohammed Guller
Parquet Mohammed From: Jeetendra Gangele [mailto:gangele...@gmail.com] Sent: Wednesday, July 22, 2015 5:48 AM To: user Subject: Need help in SparkSQL HI All, I have data in MongoDb(few TBs) which I want to migrate to HDFS to do complex queries analysis on this data.Queries like AND queries

Re: Need help with ALS Recommendation code

2015-04-05 Thread Xiangrui Meng
Could you try `sbt package` or `sbt compile` and see whether there are errors? It seems that you haven't reached the ALS code yet. -Xiangrui On Sat, Apr 4, 2015 at 5:06 AM, Phani Yadavilli -X (pyadavil) pyada...@cisco.com wrote: Hi , I am trying to run the following command in the Movie

RE: Need help with ALS Recommendation code

2015-04-05 Thread Phani Yadavilli -X (pyadavil)
@spark.apache.org Subject: Re: Need help with ALS Recommendation code Could you try `sbt package` or `sbt compile` and see whether there are errors? It seems that you haven't reached the ALS code yet. -Xiangrui On Sat, Apr 4, 2015 at 5:06 AM, Phani Yadavilli -X (pyadavil) pyada...@cisco.com wrote

Need help with ALS Recommendation code

2015-04-04 Thread Phani Yadavilli -X (pyadavil)
Hi , I am trying to run the following command in the Movie Recommendation example provided by the ampcamp tutorial Command: sbt package run /movielens/medium Exception: sbt.TrapExitSecurityException thrown from the UncaughtExceptionHandler in thread run-main-0 java.lang.RuntimeException:

Re: Configration Problem? (need help to get Spark job executed)

2015-02-17 Thread Arush Kharbanda
Hi It could be due to the connectivity issue between the master and the slaves. I have seen this issue occur for the following reasons.Are the slaves visible in the Spark UI?And how much memory is allocated to the executors. 1. Syncing of configuration between Spark Master and Slaves. 2.

Configration Problem? (need help to get Spark job executed)

2015-02-14 Thread NORD SC
Hi all, I am new to spark and seem to have hit a common newbie obstacle. I have a pretty simple setup and job but I am unable to get past this error when executing a job: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread Sasi
Does my question make sense or required some elaboration? Sasi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark-JobServer-setup-on-Maven-for-Java-programming-tp20849p20896.html Sent from the Apache Spark User List mailing list archive

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread abhishek
If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark-JobServer-setup-on-Maven-for-Java-programming-tp20849p20896.html To start a new topic under Apache Spark User List, email

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread Sasi
need to integrate with Vaadin (Java Framework). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark-JobServer-setup-on-Maven-for-Java-programming-tp20849p20898.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread abhishek
to run it. However, for our requirement, we need to integrate with Vaadin (Java Framework). If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark-JobServer-setup

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread Sasi
/Need-help-for-Spark-JobServer-setup-on-Maven-for-Java-programming-tp20849p20902.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread abhishek
around 1 lakh rows in our Cassandra table presently. Will REST URL result can withstand the response size? -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark

Re: Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-30 Thread Sasi
Thanks Abhishek. We are good know with an answer to try. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark-JobServer-setup-on-Maven-for-Java-programming-tp20849p20906.html Sent from the Apache Spark User List mailing list archive

Need help for Spark-JobServer setup on Maven (for Java programming)

2014-12-24 Thread Sasi
(for Apache Spark - Java programming). We will be glad for your suggestion. Sasi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-for-Spark-JobServer-setup-on-Maven-for-Java-programming-tp20849.html Sent from the Apache Spark User List mailing

Need help with ThriftServer/Spark1.1.0

2014-09-15 Thread Yana Kadiyska
Hi ladies and gents, trying to get Thrift server up and running in an effort to replace Shark. My first attempt to run sbin/start-thriftserver resulted in: 14/09/15 17:09:05 ERROR TThreadPoolServer: Error occurred during processing of message. java.lang.RuntimeException:

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Andrew Lee
it back to AM and eventually stored in somewhere for later troubleshooting. I'm not clear how this path is constructed until reading the source code, so I can't give a better answer. AL From: jianshi.hu...@gmail.com Date: Mon, 28 Jul 2014 13:32:05 +0800 Subject: Re: Need help, got

Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Jianshi Huang
I see Andrew, thanks for the explanantion. On Tue, Jul 29, 2014 at 5:29 AM, Andrew Lee alee...@hotmail.com wrote: I was thinking maybe we can suggest the community to enhance the Spark HistoryServer to capture the last failure exception from the container logs in the last failed stage?

Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-27 Thread Jianshi Huang
: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode To: user@spark.apache.org I nailed it down to a union operation, here's my code snippet: val properties: RDD[((String, String, String), Externalizer[KeyValue])] = vertices.map { ve = val (vertices

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Andrew Lee
15:12:18 +0800 Subject: Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode To: user@spark.apache.org I nailed it down to a union operation, here's my code snippet: val properties: RDD[((String, String, String), Externalizer[KeyValue])] = vertices.map { ve

Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-24 Thread Jianshi Huang
I can successfully run my code in local mode using spark-submit (--master local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode. Any hints what is the problem? Is it a closure serialization problem? How can I debug it? Your answers would be very helpful. 14/07/25 11:48:14

Need help with coalesce

2014-07-19 Thread Madhura
..,10 part2: 8,9,10,...,20 part3: 18,19,20,...,30 and so on... Thanks and regards, Madhura -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-with-coalesce-tp10243.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need help on Spark UDF (Join) Performance tuning .

2014-07-18 Thread S Malligarjunan
Hello Experts, Appreciate your input highly, please suggest/ give me hint, what would be the issue here?   Thanks and Regards, Malligarjunan S.   On Thursday, 17 July 2014, 22:47, S Malligarjunan smalligarju...@yahoo.com wrote: Hello Experts, I am facing performance problem when I use

Re: Need help on Spark UDF (Join) Performance tuning .

2014-07-18 Thread Michael Armbrust
It's likely that since your UDF is a black box to hive's query optimizer that it must choose a less efficient join algorithm that passes all possible matches to your function for comparison. This will happen any time your UDF touches attributes from both sides of the join. In general you can

Need help on Spark UDF (Join) Performance tuning .

2014-07-17 Thread S Malligarjunan
Hello Experts, I am facing performance problem when I use the UDF function call. Please help me to tune the query. Please find the details below shark select count(*) from table1; OK 151096 Time taken: 7.242 seconds shark select count(*) from table2;  OK 938 Time taken: 1.273 seconds Without

Re: Need help on spark Hbase

2014-07-16 Thread Madabhattula Rajesh Kumar
Hi Team, Now i've changed my code and reading configuration from hbase-site.xml file(this file is in classpath). When i run this program using : mvn exec:java -Dexec.mainClass=com.cisco.ana.accessavailability.AccessAvailability. It is working fine. But when i run this program from spark-submit

Re: Need help on spark Hbase

2014-07-16 Thread Jerry Lam
Hi Rajesh, I saw : Warning: Local jar /home/rajesh/hbase-0.96.1.1-hadoop2/lib/hbase -client-0.96.1.1-hadoop2.jar, does not exist, skipping. in your log. I believe this jar contains the HBaseConfiguration. I'm not sure what went wrong in your case but can you try without spaces in --jars i.e.

Re: Need help on spark Hbase

2014-07-15 Thread Jerry Lam
Hi Rajesh, can you describe your spark cluster setup? I saw localhost:2181 for zookeeper. Best Regards, Jerry On Tue, Jul 15, 2014 at 9:47 AM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi Team, Could you please help me to resolve the issue. *Issue *: I'm not able to connect

Re: Need help on spark Hbase

2014-07-15 Thread Krishna Sankar
One vector to check is the HBase libraries in the --jars as in : spark-submit --class your class --master master url --jars

Re: Need help on spark Hbase

2014-07-15 Thread Jerry Lam
Hi Rajesh, I have a feeling that this is not directly related to spark but I might be wrong. The reason why is that when you do: Configuration configuration = HBaseConfiguration.create(); by default, it reads the configuration files hbase-site.xml in your classpath and ... (I don't remember

Re: Need help on spark Hbase

2014-07-15 Thread Tathagata Das
Also, it helps if you post us logs, stacktraces, exceptions, etc. TD On Tue, Jul 15, 2014 at 10:07 AM, Jerry Lam chiling...@gmail.com wrote: Hi Rajesh, I have a feeling that this is not directly related to spark but I might be wrong. The reason why is that when you do: Configuration

  1   2   >