This Error message does not appear as I upgraded to 1.6.0 .
--
Cheers,
Todd Leo
On Tue, Feb 9, 2016 at 9:07 AM SLiZn Liu <sliznmail...@gmail.com> wrote:
> At least works for me though, temporarily disabled Kyro serilizer until
> upgrade to 1.6.0. Appreciate for your update. :)
>
Hi Spark Users,
I’m running Spark jobs on Mesos, and sometimes I get vast number of Task
Scheduler Errors:
ERROR TaskSchedulerImpl: Ignoring update with state FINISHED for TID 1161
because its task set is gone (this is likely the result of receiving
duplicate task finished status updates)T
It
| 2015-11-0400:00:31|
> |1446566431 | 2015-11-0400:00:31|
> +--+------+
>
>
>
>
> On Sat, Feb 6, 2016 at 11:44 PM, SLiZn Liu <sliznmail...@gmail.com> wrote:
>
>> Hi Spark Users Group,
>>
>> I have a csv file to analysis wi
I’ve found the trigger of my issue: if I start my spark-shell or submit by
spark-submit with --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer, the DataFrame
content goes wrong, as I described earlier.
On Mon, Feb 8, 2016 at 5:42 PM SLiZn Liu <sliznmail...@gmail.com>
At least works for me though, temporarily disabled Kyro serilizer until
upgrade to 1.6.0. Appreciate for your update. :)
Luciano Resende <luckbr1...@gmail.com>于2016年2月9日 周二02:37写道:
> Sorry, same expected results with trunk and Kryo serializer
>
> On Mon, Feb 8, 2016 at 4:1
Plus, I’m using *Spark 1.5.2*, with *spark-csv 1.3.0*. Also tried
HiveContext, but the result is exactly the same.
On Sun, Feb 7, 2016 at 3:44 PM SLiZn Liu <sliznmail...@gmail.com> wrote:
> Hi Spark Users Group,
>
> I have a csv file to analysis with Spark, but I’m troubling
are missing.
Good to know the way to show the whole content in a cell.
—
BR,
Todd Leo
On Sun, Feb 7, 2016 at 5:42 PM Igor Berman <igor.ber...@gmail.com> wrote:
> show has argument of truncate
> pass false so it wont truncate your results
>
> On 7 February 2016 at 11:01, SL
and have Great fortune in the Year of Monkey!
—
BR,
Todd Leo
On Sun, Feb 7, 2016 at 6:09 PM SLiZn Liu <sliznmail...@gmail.com> wrote:
> Hi Igor,
>
> In my case, it’s not a matter of *truncate*. As the show() function in
> Spark API doc reads,
>
> truncate: Whether trunca
Hi Spark Users Group,
I have a csv file to analysis with Spark, but I’m troubling with importing
as DataFrame.
Here’s the minimal reproducible example. Suppose I’m having a
*10(rows)x2(cols)* *space-delimited csv* file, shown as below:
1446566430 2015-11-0400:00:30
1446566430 2015-11-0400:00:30
Hi Gaurav,
Your graph can be saved to graph databases like Neo4j or Titan through
their drivers, that eventually saved to the disk.
BR,
Todd
Gaurav Kumar
gauravkuma...@gmail.com>于2015年11月13日 周五22:08写道:
> Hi,
>
> I was wondering how to save a graph to disk and load it back again. I know
> how
Hi Jerry,
I think you are referring to --no-switch_user. =)
chiling...@gmail.com>于2015年10月19日 周一21:05写道:
> Can you try setting SPARK_USER at the driver? It is used to impersonate
> users at the executor. So if you have user setup for launching spark jobs
> on the executor machines, simply
ons($"col")) .rdd.map( x: Row => (k, v) )
> .combineByKey()
>
> Deenar
>
> On 14 October 2015 at 05:18, SLiZn Liu <sliznmail...@gmail.com> wrote:
>
>> Hey Spark Users,
>>
>> I kept getting java.lang.OutOfMemoryError: Java heap space as I read a
&
ks.com>
>> wrote:
>>
>> import org.apache.spark.sql.functions._
>>
>> df.groupBy("category")
>> .agg(callUDF("collect_set", df("id")).as("id_list"))
>>
>> On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <
;category")
> .agg(callUDF("collect_set", df("id")).as("id_list"))
>
> On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmail...@gmail.com>
> wrote:
>
>> Hey Spark users,
>>
>> I'm trying to group by a dataframe, by appen
Hey Spark Users,
I kept getting java.lang.OutOfMemoryError: Java heap space as I read a
massive amount of json files, iteratively via read.json(). Even the result
RDD is rather small, I still get the OOM Error. The brief structure of my
program reads as following, in psuedo-code:
(r))
>
>
>
> You can always reconvert the obtained RDD after tranformation and reduce to a
> DataFrame.
>
>
> Regards,
> Rishitesh Mishra,
> SnappyData . (http://www.snappydata.io/)
>
>
> https://www.linkedin.com/profile/view?id=AAIAAAIFdkMB_v-nolCrFH6_pKf
Hey Spark users,
I'm trying to group by a dataframe, by appending occurrences into a list
instead of count.
Let's say we have a dataframe as shown below:
| category | id |
| |:--:|
| A| 1 |
| A| 2 |
| B| 3 |
| B| 4 |
| C| 5 |
ideally, after
m> wrote:
Also, you could switch to the Direct KAfka API which was first released as
> experimental in 1.3. In 1.5 we graduated it from experimental, but its
> quite usable in Spark 1.3.1
>
> TD
>
> On Tue, Sep 22, 2015 at 7:45 PM, SLiZn Liu <sliznmail...@gmail.com> wrote:
&
es.apache.org/jira/browse/SPARK-8882
>
> On Tue, Sep 22, 2015 at 12:17 AM, SLiZn Liu <sliznmail...@gmail.com>
> wrote:
>
>> Hi spark users,
>>
>> In our Spark Streaming app via Kafka integration on Mesos, we initialed 3
>> receivers to receive 3 Kafka partition
Hi spark users,
In our Spark Streaming app via Kafka integration on Mesos, we initialed 3
receivers to receive 3 Kafka partitions, whereas records receiving rate
imbalance been observed, with spark.streaming.receiver.maxRate is set to 120,
sometimes 1 of which receives very close to the limit
org.apache.hbase:hbase:1.1.1, junit:junit:x
--repositories http://some.other.repo,http://some.other.repo2 $YOUR_JAR
Best,
Burak
On Mon, Jun 29, 2015 at 11:33 PM, SLiZn Liu sliznmail...@gmail.com
wrote:
Hi Burak,
Is `--package` flag only available for maven, no sbt support?
On Tue, Jun 30, 2015 at 2:26
29, 2015 at 10:46 PM, SLiZn Liu sliznmail...@gmail.com
wrote:
Hey Spark Users,
I'm writing a demo with Spark and HBase. What I've done is packaging a
**fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
package **all dependencies** into one big jar. The rest work is copy
Hey Spark Users,
I'm writing a demo with Spark and HBase. What I've done is packaging a
**fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
package **all dependencies** into one big jar. The rest work is copy the
fat jar to Spark master node and then launch by
Hi Spark Users,
I'm trying to load a literally big file (50GB when compressed as gzip file,
stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this
file cannot be fitted in my memory. However, it looks like no RDD will be
received until I copy this big file to a prior-specified
in this use case? 50g need not to be in
memory. Give it a try with high number of partitions.
On 11 Jun 2015 23:09, SLiZn Liu sliznmail...@gmail.com wrote:
Hi Spark Users,
I'm trying to load a literally big file (50GB when compressed as gzip
file, stored in HDFS) by receiving a DStream using
However this returns a single column of c, without showing the original col1
.
On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com
wrote:
df.groupBy($col1).agg(count($col1).as(c)).show
On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu sliznmail...@gmail.com wrote:
Hi Spark
, 2015 at 11:22 PM, SLiZn Liu sliznmail...@gmail.com
wrote:
However this returns a single column of c, without showing the original
col1.
On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha sriharsha@gmail.com
wrote:
df.groupBy($col1).agg(count($col1).as(c)).show
On Thu, May 21, 2015 at 3
Hi Spark Users Group,
I’m doing groupby operations on my DataFrame *df* as following, to get
count for each value of col1:
df.groupBy(col1).agg(col1 - count).show // I don't know if I should
write like this.
col1 COUNT(col1#347)
aaa2
bbb4
ccc4
...
and more...
As I ‘d like to
= ...
sqlContext.createDataFrame(rdd, schema)
2015-05-13 12:00 GMT+02:00 SLiZn Liu sliznmail...@gmail.com:
Additionally, after I successfully packaged the code, and submitted via
spark-submit
webcat_2.11-1.0.jar, the following error was thrown at the line where
toDF() been called:
Exception in thread
toDF is not a member of RDD object
To: SLiZn Liu sliznmail...@gmail.com
Are you sure that you are submitting it correctly? Can you post the entire
command you are using to run the .jar file via spark-submit?
On Wed, May 13, 2015 at 4:07 PM, SLiZn Liu sliznmail...@gmail.com wrote:
No, creating
. What else should I try?
REGARDS,
Todd Leo
On Wed, May 13, 2015 at 11:27 AM SLiZn Liu sliznmail...@gmail.com wrote:
Thanks folks, really appreciate all your replies! I tried each of your
suggestions and in particular, *Animesh*‘s second suggestion of *making
case class definition global* helped
Hi User Group,
I’m trying to reproduce the example on Spark SQL Programming Guide
https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection,
and got a compile error when packaging with sbt:
[error] myfile.scala:30: value toDF is not a member of
wrote:
you need to instantiate a SQLContext :
val sc : SparkContext = ...
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
Le mar. 12 mai 2015 à 12:29, SLiZn Liu sliznmail...@gmail.com a écrit :
I added `libraryDependencies += org.apache.spark % spark-sql_2.11 %
1.3.1
Hi,
I am using *Spark SQL* to query on my *Hive cluster*, following Spark SQL
and DataFrame Guide
https://spark.apache.org/docs/latest/sql-programming-guide.html step by
step. However, my HiveQL via sqlContext.sql() fails and
java.lang.OutOfMemoryError was raised. The expected result of such
34 matches
Mail list logo