>
>
> The question is: what do you want to do with that count afterwards?
>
> -kr, Gerard.
>
>
> On Wed, Dec 17, 2014 at 5:11 PM, Guillermo Ortiz
> wrote:
>>
>> I'm a newbie with Spark,,, a simple question
>>
>> val errorLines = lines.
).size() > X){
//iterate values and do something for each element.
}
I think that it must be pretty basic,, argg.
2014-12-17 18:43 GMT+01:00 Guillermo Ortiz :
> What I would like to do it's to count the number of elements and if
> it's greater than a number, I have to iterat
I'm trying to make some operation with windows and intervals.
I get data every15 seconds, and want to have a windows of 60 seconds
with batch intervals of 15 seconds.
I''m injecting data with ncat. if I inject 3 logs in the same interval
I get into the "do something" each 15 secods during one min
println("fin foreachRdd"); ///Just one time,
when I start the program
Why it's just executing the println("4...")?? shouldn't it execute all
the code each 15 seconds that it's what it's defined on the context
(val ssc = new StreamingCont
Oh, I didn't understand what I was doing, my fault (too much parties
these xmas). Thought windows works in another weird way. Sorry for the
questions..
2014-12-26 13:42 GMT+01:00 Guillermo Ortiz :
> I'm trying to understand why it's not working and I typed some println
> t
I'm trying to execute Spark from a Hadoop Cluster, I have created this
script to try it:
#!/bin/bash
export HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_CLASSPATH=""
for lib in `ls /user/local/etc/lib/*.jar`
do
SPARK_CLASSPATH=$SPARK_CLASSPATH:$lib
done
/home/spark-1.1.1-bin-hadoop2.4/bin/spark
Shixiong Zhu
>
> 2015-01-08 19:23 GMT+08:00 Guillermo Ortiz :
>>
>> I'm trying to execute Spark from a Hadoop Cluster, I have created this
>> script to try it:
>>
>> #!/bin/bash
>>
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>> SPAR
When I try to execute my task with Spark it starts to copy the jars it
needs to HDFS and it finally fails, I don't know exactly why. I have
checked HDFS and it copies the files, so, it seems to work that part.
I changed the log level to debug but there's nothing else to help.
What else does Spark n
I was adding some bad jars I guess. I deleted all the jars and copied
them again and it works.
2015-01-08 14:15 GMT+01:00 Guillermo Ortiz :
> When I try to execute my task with Spark it starts to copy the jars it
> needs to HDFS and it finally fails, I don't know exactly why. I hav
I want to measure how long it takes some different transformations in Spark
as map, joinWithCassandraTable and so on. Which one is the best
aproximation to do it?
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block
val t1 = System.nanoTime()
println("Elap
I'm consuming data from Kafka with createDirectStream and store the
offsets in Kafka (
https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html#kafka-itself
)
val stream = KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
Subscribe[String, S
Hello,
I want to set data in a broadcast (Map) variable in Spark.
Sometimes there are new data so I have to update/refresh the values but I'm
not sure how I could do this.
My idea is to use accumulators like a flag when a cache error occurs, in
this point I could read the data and reload the broa
I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this
exception and Spark dies.
I couldn't see any error or problem among the machines, anybody has the
reason about this error?
java.lang.IllegalStateException: This consumer has already been closed.
at
org.apache.kafka.clients
I got this error from spark driver, it seems that I should increase the
memory in the driver although it's 5g (and 4 cores) right now. It seems
weird to me because I'm not using Kryo or broadcast in this process but in
the log there are references to Kryo and broadcast.
How could I figure out the r
I can't... do you think that it's a possible bug of this version?? from
Spark or Kafka?
El mié., 29 ago. 2018 a las 22:28, Cody Koeninger ()
escribió:
> Are you able to try a recent version of spark?
>
> On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández
> wrot
I want to execute my processes in cluster mode. As I don't know where the
driver has been executed I have to do available all the file it needs. I
undertand that they are two options. Copy all the files to all nodes of
copy them to HDFS.
My doubt is,, if I want to put all the files in HDFS, isn't
-2.0.2.jar:lib/kafka-clients-0.10.0.1.jar
\
--files conf/${1}Conf.json iris-core-0.0.1-SNAPSHOT.jar conf/${1}Conf.json
El mié., 5 sept. 2018 a las 11:11, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escribió:
> I want to execute my processes in cluster mode. As I don'
I have a process in Spark Streamin which lasts 2 seconds. When I check
where the time is spent I see about 0.8s-1s in processing time although the
global time is 2s. This one second is spent in the driver.
I reviewed the code which is executed by the driver and I commented some of
this code with th
I'm executing a load process into HBase with spark. (around 150M record).
At the end of the process there are a lot of fail tasks.
I get this error:
19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location
org.apache.hadoop.hbase.TableNotFoundException: my_table
at
org.
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/05/28 11:11:18 INFO executor.CoarseGrainedExecutorBackend: Got
assigned task 369
El mar., 28 may. 2019 a las 12:12, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escri
I'm trying to parse a RDD[Seq[String]] to Dataframe.
ALthough it's a Seq of Strings they could have a more specific type as Int,
Boolean, Double, String an so on.
For example, a line could be:
"hello", "1", "bye", "1.1"
"hello1", "11", "bye1", "2.1"
...
First column is going to be always a String,
101 - 121 of 121 matches
Mail list logo