from:"Guillermo Ortiz"

Re: Get the value of DStream[(String, Iterable[String])]

2014-12-17 Thread Guillermo Ortiz

> > > The question is: what do you want to do with that count afterwards? > > -kr, Gerard. > > > On Wed, Dec 17, 2014 at 5:11 PM, Guillermo Ortiz > wrote: >> >> I'm a newbie with Spark,,, a simple question >> >> val errorLines = lines.

Re: Get the value of DStream[(String, Iterable[String])]

2014-12-17 Thread Guillermo Ortiz

).size() > X){ //iterate values and do something for each element. } I think that it must be pretty basic,, argg. 2014-12-17 18:43 GMT+01:00 Guillermo Ortiz : > What I would like to do it's to count the number of elements and if > it's greater than a number, I have to iterat

Spark Streaming and Windows, it always counts the logs during all the windows. Why?

2014-12-26 Thread Guillermo Ortiz

I'm trying to make some operation with windows and intervals. I get data every15 seconds, and want to have a windows of 60 seconds with batch intervals of 15 seconds. I''m injecting data with ncat. if I inject 3 logs in the same interval I get into the "do something" each 15 secods during one min

Re: Spark Streaming and Windows, it always counts the logs during all the windows. Why?

2014-12-26 Thread Guillermo Ortiz

println("fin foreachRdd"); ///Just one time, when I start the program Why it's just executing the println("4...")?? shouldn't it execute all the code each 15 seconds that it's what it's defined on the context (val ssc = new StreamingCont

Re: Spark Streaming and Windows, it always counts the logs during all the windows. Why?

2014-12-26 Thread Guillermo Ortiz

Oh, I didn't understand what I was doing, my fault (too much parties these xmas). Thought windows works in another weird way. Sorry for the questions.. 2014-12-26 13:42 GMT+01:00 Guillermo Ortiz : > I'm trying to understand why it's not working and I typed some println > t

Trying to execute Spark in Yarn

2015-01-08 Thread Guillermo Ortiz

I'm trying to execute Spark from a Hadoop Cluster, I have created this script to try it: #!/bin/bash export HADOOP_CONF_DIR=/etc/hadoop/conf SPARK_CLASSPATH="" for lib in `ls /user/local/etc/lib/*.jar` do SPARK_CLASSPATH=$SPARK_CLASSPATH:$lib done /home/spark-1.1.1-bin-hadoop2.4/bin/spark

Re: Trying to execute Spark in Yarn

2015-01-08 Thread Guillermo Ortiz

Shixiong Zhu > > 2015-01-08 19:23 GMT+08:00 Guillermo Ortiz : >> >> I'm trying to execute Spark from a Hadoop Cluster, I have created this >> script to try it: >> >> #!/bin/bash >> >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> SPAR

Executing Spark, Error creating path from empty String.

2015-01-08 Thread Guillermo Ortiz

When I try to execute my task with Spark it starts to copy the jars it needs to HDFS and it finally fails, I don't know exactly why. I have checked HDFS and it copies the files, so, it seems to work that part. I changed the log level to debug but there's nothing else to help. What else does Spark n

Re: Executing Spark, Error creating path from empty String.

2015-01-08 Thread Guillermo Ortiz

I was adding some bad jars I guess. I deleted all the jars and copied them again and it works. 2015-01-08 14:15 GMT+01:00 Guillermo Ortiz : > When I try to execute my task with Spark it starts to copy the jars it > needs to HDFS and it finally fails, I don't know exactly why. I hav

Measure performance time in some spark transformations.

2018-05-12 Thread Guillermo Ortiz Fernández

I want to measure how long it takes some different transformations in Spark as map, joinWithCassandraTable and so on. Which one is the best aproximation to do it? def time[R](block: => R): R = { val t0 = System.nanoTime() val result = block val t1 = System.nanoTime() println("Elap

Reset the offsets, Kafka 0.10 and Spark

2018-06-07 Thread Guillermo Ortiz Fernández

I'm consuming data from Kafka with createDirectStream and store the offsets in Kafka ( https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html#kafka-itself ) val stream = KafkaUtils.createDirectStream[String, String]( streamingContext, PreferConsistent, Subscribe[String, S

Refresh broadcast variable when it isn't the value.

2018-08-19 Thread Guillermo Ortiz Fernández

Hello, I want to set data in a broadcast (Map) variable in Spark. Sometimes there are new data so I have to update/refresh the values but I'm not sure how I could do this. My idea is to use accumulators like a flag when a cache error occurs, in this point I could read the data and reload the broa

Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández

I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this exception and Spark dies. I couldn't see any error or problem among the machines, anybody has the reason about this error? java.lang.IllegalStateException: This consumer has already been closed. at org.apache.kafka.clients

java.lang.OutOfMemoryError: Java heap space - Spark driver.

2018-08-29 Thread Guillermo Ortiz Fernández

I got this error from spark driver, it seems that I should increase the memory in the driver although it's 5g (and 4 cores) right now. It seems weird to me because I'm not using Kryo or broadcast in this process but in the log there are references to Kryo and broadcast. How could I figure out the r

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández

I can't... do you think that it's a possible bug of this version?? from Spark or Kafka? El mié., 29 ago. 2018 a las 22:28, Cody Koeninger () escribió: > Are you able to try a recent version of spark? > > On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández > wrot

deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Guillermo Ortiz Fernández

I want to execute my processes in cluster mode. As I don't know where the driver has been executed I have to do available all the file it needs. I undertand that they are two options. Copy all the files to all nodes of copy them to HDFS. My doubt is,, if I want to put all the files in HDFS, isn't

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Guillermo Ortiz Fernández

-2.0.2.jar:lib/kafka-clients-0.10.0.1.jar \ --files conf/${1}Conf.json iris-core-0.0.1-SNAPSHOT.jar conf/${1}Conf.json El mié., 5 sept. 2018 a las 11:11, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > I want to execute my processes in cluster mode. As I don'

Trying to improve performance of the driver.

2018-09-13 Thread Guillermo Ortiz Fernández

I have a process in Spark Streamin which lasts 2 seconds. When I check where the time is spent I see about 0.8s-1s in processing time although the global time is 2s. This one second is spent in the driver. I reviewed the code which is executed by the driver and I commented some of this code with th

Putting record in HBase with Spark - error get regions.

2019-05-28 Thread Guillermo Ortiz Fernández

I'm executing a load process into HBase with spark. (around 150M record). At the end of the process there are a lot of fail tasks. I get this error: 19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location org.apache.hadoop.hbase.TableNotFoundException: my_table at org.

Re: Putting record in HBase with Spark - error get regions.

2019-05-28 Thread Guillermo Ortiz Fernández

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/05/28 11:11:18 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 369 El mar., 28 may. 2019 a las 12:12, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escri

Parse RDD[Seq[String]] to DataFrame with types.

2019-07-17 Thread Guillermo Ortiz Fernández

I'm trying to parse a RDD[Seq[String]] to Dataframe. ALthough it's a Seq of Strings they could have a more specific type as Int, Boolean, Double, String an so on. For example, a line could be: "hello", "1", "bye", "1.1" "hello1", "11", "bye1", "2.1" ... First column is going to be always a String,

Re: Get the value of DStream[(String, Iterable[String])]

Re: Get the value of DStream[(String, Iterable[String])]

Spark Streaming and Windows, it always counts the logs during all the windows. Why?

Re: Spark Streaming and Windows, it always counts the logs during all the windows. Why?

Re: Spark Streaming and Windows, it always counts the logs during all the windows. Why?

Trying to execute Spark in Yarn

Re: Trying to execute Spark in Yarn

Executing Spark, Error creating path from empty String.

Re: Executing Spark, Error creating path from empty String.

Measure performance time in some spark transformations.

Reset the offsets, Kafka 0.10 and Spark

Refresh broadcast variable when it isn't the value.

Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

java.lang.OutOfMemoryError: Java heap space - Spark driver.

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

deploy-mode cluster. FileNotFoundException

Re: deploy-mode cluster. FileNotFoundException

Trying to improve performance of the driver.

Putting record in HBase with Spark - error get regions.

Re: Putting record in HBase with Spark - error get regions.

Parse RDD[Seq[String]] to DataFrame with types.

< 1 2

101 - 121 of 121 matches

Site Navigation

Mail list logo

Footer information