from:"nsalian"

Re: spark parquet too many small files ?

2016-07-01 Thread nsalian

Hi Sri, Thanks for the question. You can simply start by doing this in the initial stage: val sqlContext = new SQLContext(sc) val customerList = sqlContext.read.json(args(0)).coalesce(20) //using a json example here where the argument is the path to the file(s). This will reduce the partitions.

Re: Spark-SQL with Oozie

2016-06-14 Thread nsalian

Hi, Thanks for the question. This would be a good starting point for your Oozie workflow application with a Spark action. - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-with-Oozie-tp27167p27168.html Sent from the

Re: What is the minimum value allowed for StreamingContext's Seconds parameter?

2016-05-23 Thread nsalian

Thanks for the question. What kind of data rate are you expecting to receive? - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-minimum-value-allowed-for-StreamingContext-s-Seconds-parameter-tp27007p27008.html Sent

Re: yarn-cluster

2016-05-04 Thread nsalian

Hi, this is a good spot to start for Spark and YARN. https://spark.apache.org/docs/1.5.0/running-on-yarn.html specific to the version you are on, you can toggle between pages. - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Re: Error while running jar using spark-submit on another machine

2016-05-03 Thread nsalian

Thank you for the question. What is different on this machine as compared to the ones where the job succeeded? - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-jar-using-spark-submit-on-another-machine-tp

Re: a question about --executor-cores

2016-05-03 Thread nsalian

Hello, Thank you for posting the question. To begin I do have a few questions. 1) What is size of the YARN installation? How many NodeManagers? 2) Notes to Remember: Container Virtual CPU Cores yarn.nodemanager.resource.cpu-vcores >> Number of virtual CPU cores that can be allocated for contai

Re: Creating new Spark context when running in Secure YARN fails

2016-05-03 Thread nsalian

Feel free to correct me if I am wrong. But I believe this isn't a feature yet: "create a new Spark context within a single JVM process (driver)" A few questions for you: 1) Is Kerberos setup correctly for you (the user) 2) Could you please add the command/ code you are executing? Checking to see

Re: yarn-cluster

2016-05-03 Thread nsalian

Hello, Thank you for the question. The Status UNDEFINED means the application has not been completed and not been resourced. Upon getting assignment it will progress to RUNNING and then SUCCEEDED upon completion. It isn't a problem that you should worry about. You should make sure to tune your YA

Spark SQL StructType error

2016-04-16 Thread nsalian

Hello, I am parsing a text file and inserting the parsed values into a Hive table. Code: files = sc.wholeTextFiles("hdfs://nameservice1:8020/user/root/email.txt", minPartitions=16, use_unicode=True) # Putting unicode to False didn't help either sqlContext.sql("DROP TABLE emails") sqlContext.sq

Re: HP customer support @ www.globalpccure.com/Support/Support-for-HP.aspx

2016-03-19 Thread nsalian

Please refrain from posting such messages on this email thread. This is specific to the Spark ecosystem and not an avenue to advertise an entity/company. Thank you. - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HP-customer

Re: Spark UI documentaton needed

2016-02-22 Thread nsalian

Hi Ajay, Feel free to open a JIRA with the fields that you think are missing and what kind of documentation you wish to see. It would be best to have it in a JIRA to actually track and triage your suggestions. Thank you. - Neelesh S. Salian Cloudera -- View this message in context: http:

Re: Error when executing Spark application on YARN

2016-02-17 Thread nsalian

Hi, Thanks for the question. I do see this in the bottom: 16/02/17 15:31:02 ERROR SparkContext: Error initializing SparkContext. Some questions to help get more understanding: 1) Does this happen to any other jobs? 2) Any changes to the Spark setup in recent time? 3) Could you open the trackin

Re: Write spark eventLog to both HDFS and local FileSystem

2016-02-13 Thread nsalian

Hi, Thanks for the question. 1) The core-site.xml holds the parameter for the defaultFS: fs.defaultFS hdfs://:8020 This will be appended to your value in spark.eventLog.dir. So depending on which location you intend to write it to, you can point it to either HDFS or local. As far as

Re: Spark master takes more time with local[8] than local[1]

2016-01-25 Thread nsalian

Hi, Thanks for the question. Is it possible for you to elaborate on your application? The flow of the application will help to understand what could potentially cause things to slow down? Do logs give you any idea what goes on? Have you had a chance to look? Thank you. - Neelesh S. Salian

Re: Job History Logs for spark jobs submitted on YARN

2016-01-21 Thread nsalian

Hello, Thanks for the question. 1) Typically the Resource Manager in YARN would print out the Aggregate Resource Allocation for the application after you have found the specific application using the application id. 2) As MapReduce, there is a parameter that is part of either the spark-defaults.c

Re: process of executing a program in a distributed environment without hadoop

2016-01-21 Thread nsalian

Thanks for the question. The documentation here: https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit enlists a variety of submission techniques. You can vary the Master URLs to suit your needs whether it be local/ yarn or mesos. - Nee

Re: spark-defaults.conf optimal configuration

2015-12-08 Thread nsalian

Hi Chris, Thank you for posting the question. Tuning spark configurations is a tricky task since there are a lot factors to consider. The configurations that you listed cover the most them. To understand the situation that can guide you in making a decision about tuning: 1) What kind of spark app

Re: ERROR Executor java.lang.NoClassDefFoundError

2015-08-13 Thread nsalian

If --jars doesn't work, try --conf "spark.executor.extraClassPath=" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-Executor-java-lang-NoClassDefFoundError-tp24244p24256.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: How does one decide no of executors/cores/memory allocation?

2015-06-17 Thread nsalian

Hello shreesh, That would be quite a challenge to understand. A few things that I think should help estimate those numbers: 1) Understanding the cost of the individual transformations in the application E.g a flatMap can be more expansive in memory as opposed to a map 2) The communication patter

Suggestions for Posting on the User Mailing List

2015-06-16 Thread nsalian

As discussed during the meetup, the following information should help while creating a topic on the User mailing list. 1) Version of Spark and Hadoop should be included to help reproduce the issue or understand if the issue is a version limitation 2) Explanation about the scenario in as much deta

Re: SparkR 1.4.0: read.df() function fails

2015-06-16 Thread nsalian

Hello, Is the json file in HDFS or local? "/home/esten/ami/usaf.json" is this an HDFS path? Suggestions: 1) Specify "file:/home/esten/ami/usaf.json" 2) Or move the usaf.json file into HDFS since the application is looking for the file in HDFS. Please let me know if that helps. Thank you. --

Re: Spark application in production without HDFS

2015-06-15 Thread nsalian

Hi, Spark on YARN should help in the memory management for Spark jobs. Here is a good starting point: https://spark.apache.org/docs/latest/running-on-yarn.html YARN integrates well with HDFS and should be a good solution for a large cluster. What specific features are you looking for that HDFS doe

Re: Issue running Spark 1.4 on Yarn

2015-06-11 Thread nsalian

Hello, Since the other queues are fine, I reckon, there may be a limit in the max apps or memory on this queue in particular. I don't suspect fairscheduler limits either but on this queue we may be seeing / hitting a maximum. Could you try to get the configs for the queue? That should provide mor

Re: Issue running Spark 1.4 on Yarn

2015-06-10 Thread nsalian

Hi, Thanks for the added information. Helps add more context. Is that specific queue different from the others? FairScheduler.xml should have the information needed.Or if you have a separate allocations.xml. Something of this format: 1 mb,0vcores 9 mb,0vcores 50 0.1

Re: Can a Spark App run with spark-submit write pdf files to HDFS

2015-06-09 Thread nsalian

By writing PDF files, do you mean something equivalent to a hadoop fs -put /? I'm not sure how Pdfbox works though, have you tried writing individually without spark? We can potentially look if you have established that as a starting point to see how Spark can be interfaced to write to HDFS. Mor

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread nsalian

I see the other jobs SUCCEEDED without issues. Could you snapshot the FairScheduler activity as well? My guess it, with the single core, it is reaching a NodeManager that is still busy with other jobs and the job ends up in a waiting state. Does the job eventually complete? Could you potentiall

Re: Kafka Spark Streaming: ERROR EndpointWriter: dropping message

2015-06-09 Thread nsalian

1) Could you share your command? 2) Are the kafka brokers on the same host? 3) Could you run a --describe on the topic to see if the topic is setup correctly (just to be sure)? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-ERROR-

Re: Spark Performance on Yarn

2015-04-22 Thread nsalian

+1 to executor-memory to 5g. Do check the overhead space for both the driver and the executor as per Wilfred's suggestion. Typically, 384 MB should suffice. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22610.html Sent fr

Re: Low resource when upgrading from 1.1.0 to 1.3.0

2015-04-05 Thread nsalian

Could you check whether your workers are registered to the Master? Moreover, also look at the heap size for each Worker. For reference, could you paste the exact command that you executed? You mentioned that you changed the script; what is the change? -- View this message in context: http://ap

Re: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-30 Thread nsalian

Try running it like this: sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10 Caveats: 1) Make sure the permissions of /user/nick is 775 or 777. 2) No need for hostnam

Re: Spark-submit not working when application jar is in hdfs

2015-03-30 Thread nsalian

Client mode would not support HDFS jar extraction. I tried this: sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10 And it worked. -- View this message in context:

Re: spark parquet too many small files ?

Re: Spark-SQL with Oozie

Re: What is the minimum value allowed for StreamingContext's Seconds parameter?

Re: yarn-cluster

Re: Error while running jar using spark-submit on another machine

Re: a question about --executor-cores

Re: Creating new Spark context when running in Secure YARN fails

Re: yarn-cluster

Spark SQL StructType error

Re: HP customer support @ www.globalpccure.com/Support/Support-for-HP.aspx

Re: Spark UI documentaton needed

Re: Error when executing Spark application on YARN

Re: Write spark eventLog to both HDFS and local FileSystem

Re: Spark master takes more time with local[8] than local[1]

Re: Job History Logs for spark jobs submitted on YARN

Re: process of executing a program in a distributed environment without hadoop

Re: spark-defaults.conf optimal configuration

Re: ERROR Executor java.lang.NoClassDefFoundError

Re: How does one decide no of executors/cores/memory allocation?

Suggestions for Posting on the User Mailing List

Re: SparkR 1.4.0: read.df() function fails

Re: Spark application in production without HDFS

Re: Issue running Spark 1.4 on Yarn

Re: Issue running Spark 1.4 on Yarn

Re: Can a Spark App run with spark-submit write pdf files to HDFS

Re: Issue running Spark 1.4 on Yarn

Re: Kafka Spark Streaming: ERROR EndpointWriter: dropping message

Re: Spark Performance on Yarn

Re: Low resource when upgrading from 1.1.0 to 1.3.0

Re: java.io.FileNotFoundException when using HDFS in cluster mode

Re: Spark-submit not working when application jar is in hdfs

31 matches

Site Navigation

Mail list logo

Footer information