date:20161201

Accessing log for lost executors

2016-12-01 Thread Nisrina Luthfiyati

Hi all, I'm trying to troubleshoot an ExecutorLostFailure issue. In Spark UI I noticed that executors tab only list active executors, is there any way that I can see the log for dead executors so that I can find out why it's dead/lost? I'm using Spark 1.5.2 on YARN 2.7.1. Thanks! Nisrina

Usage of -javaagent with spark.executor.extrajavaoptions configuration

2016-12-01 Thread Kanchan W

Hello, I am an apache spark newbie and have a question regarding spark.executor.extrajavaoptions configuration property present in spark 2.0.2 I have a requirement to start a javaagent on spark executor in standalone mode of spark interactive shell & Spark-submit. In order to do the same, I

Fwd: [Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

2016-12-01 Thread w.zhaokang

Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance. In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc but

[Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

2016-12-01 Thread Dale Wang

Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance. In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc but

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Weiwei Zhang

Thanks Felix. Anyone know when this feature will be rolled out in GraphFrame? Best Regards, Weiwei On Thu, Dec 1, 2016 at 5:22 PM, Felix Cheung wrote: > That's correct - currently GraphFrame does not compute PageRank with > weighted edges. > > >

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Felix Cheung

That's correct - currently GraphFrame does not compute PageRank with weighted edges. _ From: Weiwei Zhang > Sent: Thursday, December 1, 2016 2:41 PM Subject: [GraphFrame, Pyspark] Weighted Edge in PageRank To:

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Michal Šenkýř

Hello Vinayak, As I understand it, Spark creates a Derby metastore database in the current location, in the metastore_db subdirectory, whenever you first use an SQL context. This database cannot be shared by multiple instances. This should be controlled by the javax.jdo.option.ConnectionURL

RE: How to Check Dstream is empty or not?

2016-12-01 Thread bryan.jeffrey

The stream is just a wrapper over batch operations. You can check if a batch is empty by simply doing: val isEmpty = stream.transform(rdd => rdd.isEmpty) This will give you a stream of Boolean indicating if given batches are empty. Bryan Jeffrey From: rockinf...@gmail.com Sent: Thursday,

unsubscribe

2016-12-01 Thread Patnaik, Vandana

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Michael Armbrust

Yes ! On Thu, Dec 1, 2016 at 12:57 PM, ayan guha wrote: > Thanks TD. Will it be available in pyspark too? > On 1 Dec 2016 19:55, "Tathagata Das" wrote: > >> In

[GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Weiwei Zhang

Hi guys, I am trying to compute the pagerank for the locations in the following dummy dataframe, *srcdes shared_gas_stations* A B 2 A C 10 C E 3 D E 12 E G 5 ... I have tried the

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread ayan guha

Thanks TD. Will it be available in pyspark too? On 1 Dec 2016 19:55, "Tathagata Das" wrote: > In the meantime, if you are interested, you can read the design doc in the > corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124 > > On Thu, Dec 1, 2016

quick question

2016-12-01 Thread kant kodali

Assume I am running a Spark Client Program in client mode and Spark Cluster in Stand alone mode. I want some clarification of the following things 1. Build a DAG 2. DAG Scheduler 3. TASK Scheduler I want to which of the above part is done by SPARK CLIENT and which of the above parts are done by

unsubscribe

2016-12-01 Thread Vishal Soni

support vector regression in spark

2016-12-01 Thread roni

Hi All, I want to know how can I do support vector regression in SPARK? Thanks R

Re: Spark-shell doesn't see changes coming from Kafka topic

2016-12-01 Thread Tathagata Das

Can you confirm the following? 1. Are you sending new data to the Kafka topic AFTER starting the streaming query? Since you have specified `*startingOffsets` *as* `latest`*, data needs to the topic after the query start for the query to receiver. 2. Are you able to read kafka data using Kafka's

Re: Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

2016-12-01 Thread shyla deshpande

Used SparkSession, Works now. Thanks. On Wed, Nov 30, 2016 at 11:02 PM, Deepak Sharma wrote: > In Spark > 2.0 , spark session was introduced that you can use to query > hive as well. > Just make sure you create spark session with enableHiveSupport() option. > > Thanks >

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Vinayak Joshi5

This is the error received: 16/12/01 22:35:36 ERROR Schema: Failed initialising database. Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to

Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-01 Thread Vinayak Joshi5

With a local spark instance built with hive support, (-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver) The following script/sequence works in Pyspark without any error against 1.6.x, but fails with 2.x. people = sc.parallelize(["Michael,30", "Andy,12", "Justin,19"])

Unsubscribe

2016-12-01 Thread hardik nagda

RE: build models in parallel

2016-12-01 Thread Masood Krohy

You can use your groupId as a grid parameter, filter your dataset using this id in a pipeline stage, before feeding it to the model. The following may help: http://spark.apache.org/docs/latest/ml-tuning.html

Spark-shell doesn't see changes coming from Kafka topic

2016-12-01 Thread Otávio Carvalho

Hello hivemind, I am trying to connect my Spark 2.0.2 cluster to an Apache Kafka 0.10 cluster via spark-shell. The connection works fine, but it is not able to receive the messages published to the topic. It doesn't throw any error, but it is not able to retrieve any message (I am sure that

newly added Executors couldn't fetch jar files from Master

2016-12-01 Thread Evgenii Morozov

Hi I’ve got working cluster for more, than couple of weeks with 20 workers. Everything was perfect. Today I added 4 more workers and all of them couldn’t fetch jar files from master. The following means to me that master is available to worker, it is registered there and it started

How to Check Dstream is empty or not?

2016-12-01 Thread rockinf...@gmail.com

I have integerated flume with spark using Flume-style Push-based Approach. I need to check whether Dstream is empty. Please suggest how can i do that? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-Check-Dstream-is-empty-or-not-tp28151.html Sent

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread Marco Mistroni

Kant, We need to narrow it down to a reproducible code. You are using streaming What is the content of ur streamed data. If u provide that I can run a streaming programming that reads from a local dir and narrow down the problem I have seen similar error for doing something completely different.

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali

sorry for multiple emails. I just think more info is needed every time to address this problem My Spark Client program runs in a client mode and it runs on a node that has 2 vCPU's and 8GB RAM (m4.large) I have 2 Spark worker nodes and each have 4 vCPU's and 16GB RAM (m3.xlarge for each spark

Re: Spark Job not exited and shows running

2016-12-01 Thread Selvam Raman

Hi, I have run the job in cluster mode as well. The job is not ending. After sometime the container just do nothing but it shows running. In my code, every record has been inserted into solr and cassandra as well. When i ran it only for solr the job completed successfully. Still i did not test

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali

My batch interval is 1s slide interval is 1s window interval is 1 minute I am using a standalone alone cluster. I don't have any storage layer like HDFS. so I dont know what is a connection between RDD and blocks (I know that for every batch one RDD is produced)? what is a block in this context?

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Tathagata Das

In the meantime, if you are interested, you can read the design doc in the corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124 On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das wrote: > That feature is coming in 2.1.0. We have added watermarking,

Re: [structured streaming] How to remove outdated data when use Window Operations

2016-12-01 Thread Tathagata Das

That feature is coming in 2.1.0. We have added watermarking, that will track the event time of the data and accordingly close old windows, output its corresponding aggregate and then drop its corresponding state. But in that case, you will have to use append mode, and aggregated data of a

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali

I also use this super(StorageLevel.MEMORY_AND_DISK_2()); inside my receiver On Wed, Nov 30, 2016 at 10:44 PM, kant kodali wrote: > Here is another transformation that might cause the error but it has to be > one of these two since I only have two transformations > >

RE: PySpark to remote cluster

2016-12-01 Thread Schaefers, Klaus

Hi, I moved my Pyspark to 2.0.1 and now I can connect. However, I cannot execute any job. I always get an "16/12/01 09:37:07 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources" error. I

Accessing log for lost executors

Usage of -javaagent with spark.executor.extrajavaoptions configuration

Fwd: [Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

[Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

RE: How to Check Dstream is empty or not?

unsubscribe

Re: [structured streaming] How to remove outdated data when use Window Operations

[GraphFrame, Pyspark] Weighted Edge in PageRank

Re: [structured streaming] How to remove outdated data when use Window Operations

quick question

unsubscribe

support vector regression in spark

Re: Spark-shell doesn't see changes coming from Kafka topic

Re: Spark 2.0.2 , using DStreams in Spark Streaming . How do I create SQLContext? Please help

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

Spark 2.x Pyspark Spark SQL createDataframe Error

Unsubscribe

RE: build models in parallel

Spark-shell doesn't see changes coming from Kafka topic

newly added Executors couldn't fetch jar files from Master

How to Check Dstream is empty or not?

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

Re: Spark Job not exited and shows running

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

Re: [structured streaming] How to remove outdated data when use Window Operations

Re: [structured streaming] How to remove outdated data when use Window Operations

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

RE: PySpark to remote cluster

32 matches

Site Navigation

Mail list logo

Footer information