Re: RE: How to compile Spark with private build of Hadoop

2016-03-08 Thread fightf...@163.com
customized hadoop jar and relative pom.xml to nexus repository. Check the link for reference: https://books.sonatype.com/nexus-book/reference/staging-deployment.html fightf...@163.com From: Lu, Yingqi Date: 2016-03-08 15:23 To: fightf...@163.com; user Subject: RE: How to compile Spark

Re: How to compile Spark with private build of Hadoop

2016-03-07 Thread fightf...@163.com
I think you can establish your own maven repository and deploy your modified hadoop binary jar with your modified version number. Then you can add your repository in spark pom.xml and use mvn -Dhadoop.version= fightf...@163.com From: Lu, Yingqi Date: 2016-03-08 15:09 To: user

Re: spark 1.6 Not able to start spark

2016-02-22 Thread fightf...@163.com
I think this may be some permission issue. Check your spark conf for hadoop related. fightf...@163.com From: Arunkumar Pillai Date: 2016-02-23 14:08 To: user Subject: spark 1.6 Not able to start spark Hi When i try to start spark-shell I'm getting following error Exception in thread "

Re: Re: About cache table performance in spark sql

2016-02-04 Thread fightf...@163.com
Oh, thanks. Make sense to me. Best, Sun. fightf...@163.com From: Takeshi Yamamuro Date: 2016-02-04 16:01 To: fightf...@163.com CC: user Subject: Re: Re: About cache table performance in spark sql Hi, Parquet data are column-wise and highly compressed, so the size of deserialized rows

clear cache using spark sql cli

2016-02-03 Thread fightf...@163.com
Hi, How could I clear cache (execute sql query without any cache) using spark sql cli ? Is there any command available ? Best, Sun. fightf...@163.com

Re: Re: clear cache using spark sql cli

2016-02-03 Thread fightf...@163.com
Hi, Ted Yes. I had seen that issue. But it seems that in spark-sql cli cannot do command like : sqlContext.clearCache() Is this right ? In spark-sql cli I can only run some sql queries. So I want to see if there are any available options to reach this. Best, Sun. fightf...@163.com

Re: Re: About cache table performance in spark sql

2016-02-03 Thread fightf...@163.com
? From impala I get the overall parquet file size if about 24.59GB. Would be good to had some correction on this. Best, Sun. fightf...@163.com From: Prabhu Joseph Date: 2016-02-04 14:35 To: fightf...@163.com CC: user Subject: Re: About cache table performance in spark sql Sun, When

Re: Re: clear cache using spark sql cli

2016-02-03 Thread fightf...@163.com
...@163.com From: Ted Yu Date: 2016-02-04 11:49 To: fightf...@163.com CC: user Subject: Re: Re: clear cache using spark sql cli In spark-shell, I can do: scala> sqlContext.clearCache() Is that not the case for you ? On Wed, Feb 3, 2016 at 7:35 PM, fightf...@163.com <fightf...@163.com>

About cache table performance in spark sql

2016-02-03 Thread fightf...@163.com
age cannot hold the 24.59GB+ table size into memory. But why the performance is so different and even so bad ? Best, Sun. fightf...@163.com

Re: Re: spark dataframe jdbc read/write using dbcp connection pool

2016-01-20 Thread fightf...@163.com
cessfully. Do I need to increase the partitions? Or is there any other alternatives I can choose to tune this ? Best, Sun. fightf...@163.com From: fightf...@163.com Date: 2016-01-20 15:06 To: 刘虓 CC: user Subject: Re: Re: spark dataframe jdbc read/write using dbcp connection pool Hi, Thanks a lot

Re: Re: spark dataframe jdbc read/write using dbcp connection pool

2016-01-20 Thread fightf...@163.com
was 377,769 milliseconds ago. The last packet sent successfully to the server was 377,790 milliseconds ago. Do I need to increase the partitions ? Or shall I write parquet file for each partition in a iterable way ? Thanks a lot for your advice. Best, Sun. fightf...@163.com From: 刘虓 Date

spark dataframe jdbc read/write using dbcp connection pool

2016-01-19 Thread fightf...@163.com
1 in stage 0.0 (TID 2) com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure fightf...@163.com

Re: Re: spark dataframe jdbc read/write using dbcp connection pool

2016-01-19 Thread fightf...@163.com
4") The added_year column in mysql table contains range of (1985-2015), and I pass the numPartitions property to get the partition purpose. Is this what you recommend ? Can you advice a little more implementation on this ? Best, Sun. fightf...@163.com From: 刘虓 Date: 2016-01-20 11:26

spark dataframe read large mysql table running super slow

2016-01-06 Thread fightf...@163.com
rTempTable("video_test") sqlContext.sql("select count(1) from video_test").show() Overally the load process would stuck and get connection timeout. Mysql table hold about 100 million records. Would be happy to provide more usable info. Best, Sun. fightf...@163.com

Re: Spark 1.5.2 compatible spark-cassandra-connector

2015-12-29 Thread fightf...@163.com
Hi, Vivek M I had ever tried 1.5.x spark-cassandra connector and indeed encounter some classpath issues, mainly for the guaua dependency. I believe that can be solved by some maven config, but have not tried that yet. Best, Sun. fightf...@163.com From: vivek.meghanat...@wipro.com Date

回复: How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-25 Thread fightf...@163.com
Emm...I think you can do a df.map and store each column value to your list. fightf...@163.com 发件人: zml张明磊 发送时间: 2015-12-25 15:33 收件人: user@spark.apache.org 抄送: dev-subscr...@spark.apache.org 主题: How can I get the column data based on specific column name and then stored these data in array

Re: Re: Spark assembly in Maven repo?

2015-12-11 Thread fightf...@163.com
Agree with you that assembly jar is not good to publish. However, what he really need is to fetch an updatable maven jar file. fightf...@163.com From: Mark Hamstra Date: 2015-12-11 15:34 To: fightf...@163.com CC: Xiaoyong Zhu; Jeff Zhang; user; Zhaomin Xu; Joe Zhang (SDE) Subject: Re: RE

Re: RE: Spark assembly in Maven repo?

2015-12-10 Thread fightf...@163.com
Using maven to download the assembly jar is fine. I would recommend to deploy this assembly jar to your local maven repo, i.e. nexus repo, Or more likey a snapshot repository fightf...@163.com From: Xiaoyong Zhu Date: 2015-12-11 15:10 To: Jeff Zhang CC: user@spark.apache.org; Zhaomin Xu

回复: Re: About Spark On Hbase

2015-12-09 Thread fightf...@163.com
of using this. fightf...@163.com 发件人: censj 发送时间: 2015-12-09 15:44 收件人: fightf...@163.com 抄送: user@spark.apache.org 主题: Re: About Spark On Hbase So, I how to get this jar? I use set package project.I not found sbt lib. 在 2015年12月9日,15:42,fightf...@163.com 写道: I don't think it really need CDH

count distinct in spark sql aggregation

2015-12-09 Thread fightf...@163.com
and got the daily distinct count. However , I am not sure about this implementation can be some efficient workaround. Hope some guys can shed a little light on this. Best, Sun. fightf...@163.com

回复: Re: About Spark On Hbase

2015-12-08 Thread fightf...@163.com
I don't think it really need CDH component. Just use the API fightf...@163.com 发件人: censj 发送时间: 2015-12-09 15:31 收件人: fightf...@163.com 抄送: user@spark.apache.org 主题: Re: About Spark On Hbase But this is dependent on CDH。I not install CDH。 在 2015年12月9日,15:18,fightf...@163.com 写道: Actually

Re: About Spark On Hbase

2015-12-08 Thread fightf...@163.com
Actually you can refer to https://github.com/cloudera-labs/SparkOnHBase Also, HBASE-13992 already integrates that feature into the hbase side, but that feature has not been released. Best, Sun. fightf...@163.com From: censj Date: 2015-12-09 15:04 To: user@spark.apache.org Subject: About

Re: Re: spark sql cli query results written to file ?

2015-12-03 Thread fightf...@163.com
Well , Sorry for late reponse and thanks a lot for pointing out the clue. fightf...@163.com From: Akhil Das Date: 2015-12-03 14:50 To: Sahil Sareen CC: fightf...@163.com; user Subject: Re: spark sql cli query results written to file ? Oops 3 mins late. :) Thanks Best Regards On Thu, Dec 3

spark sql cli query results written to file ?

2015-12-02 Thread fightf...@163.com
HI, How could I save the spark sql cli running queries results and write the results to some local file ? Is there any available command ? Thanks, Sun. fightf...@163.com

Re: New to Spark

2015-12-01 Thread fightf...@163.com
and hive config, that would help to locate root cause for the problem. Best, Sun. fightf...@163.com From: Ashok Kumar Date: 2015-12-01 18:54 To: user@spark.apache.org Subject: New to Spark Hi, I am new to Spark. I am trying to use spark-sql with SPARK CREATED and HIVE CREATED tables. I have

Re: RE: error while creating HiveContext

2015-11-27 Thread fightf...@163.com
Could you provide your hive-site.xml file info ? Best, Sun. fightf...@163.com From: Chandra Mohan, Ananda Vel Murugan Date: 2015-11-27 17:04 To: fightf...@163.com; user Subject: RE: error while creating HiveContext Hi, I verified and I could see hive-site.xml in spark conf directory

Re: error while creating HiveContext

2015-11-26 Thread fightf...@163.com
Hi, I think you just want to put the hive-site.xml in the spark/conf directory and it would load it into spark classpath. Best, Sun. fightf...@163.com From: Chandra Mohan, Ananda Vel Murugan Date: 2015-11-27 15:04 To: user Subject: error while creating HiveContext Hi, I am building

Re: Spark Thrift doesn't start

2015-11-10 Thread fightf...@163.com
I think the exception info just says clear that you may miss some tez related jar on the spark thrift server classpath. fightf...@163.com From: DaeHyun Ryu Date: 2015-11-11 14:47 To: user Subject: Spark Thrift doesn't start Hi folks, I configured tez as execution engine of Hive. After done

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread fightf...@163.com
Hi, Have you ever considered cassandra as a replacement ? We are now almost the seem usage as your engine, e.g. using mysql to store initial aggregated data. Can you share more about your kind of Cube queries ? We are very interested in that arch too : ) Best, Sun. fightf...@163.com

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread fightf...@163.com
for prompt response. fightf...@163.com From: tsh Date: 2015-11-10 02:56 To: fightf...@163.com; user; dev Subject: Re: OLAP query using spark dataframe with cassandra Hi, I'm in the same position right now: we are going to implement something like OLAP BI + Machine Learning explorations on the same

OLAP query using spark dataframe with cassandra

2015-11-08 Thread fightf...@163.com
-apache-cassandra-and-spark fightf...@163.com

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-08 Thread fightf...@163.com
of olap architecture. And we are happy to hear more use case from this community. Best, Sun. fightf...@163.com From: Jörn Franke Date: 2015-11-09 14:40 To: fightf...@163.com CC: user; dev Subject: Re: OLAP query using spark dataframe with cassandra Is there any distributor supporting

回复: spark to hbase

2015-10-27 Thread fightf...@163.com
Hi I notice that you configured the following : configuration.set("hbase.master", "192.168.1:6"); Did you mistyped the host IP ? Best, Sun. fightf...@163.com 发件人: jinhong lu 发送时间: 2015-10-27 17:22 收件人: spark users 主题: spark to hbase Hi, I write my result to hd

Spark standalone/Mesos on top of Ceph

2015-09-22 Thread fightf...@163.com
such progress ? Best, Sun. fightf...@163.com

Re: Re: Spark standalone/Mesos on top of Ceph

2015-09-22 Thread fightf...@163.com
Gateway s3 rest api, agreed for such inconvinience and some incompobilities. However, we had not yet quite researched and tested over radosgw a lot. But we had some little requirements using gw in some use cases. Hope for more considerations and talks. Best, Sun. fightf...@163.com From: Jerry

Re: PermGen Space Error

2015-07-29 Thread fightf...@163.com
Hi, Sarath Did you try to use and increase spark.excecutor.extraJaveOptions -XX:PermSize= -XX:MaxPermSize= fightf...@163.com From: Sarath Chandra Date: 2015-07-29 17:39 To: user@spark.apache.org Subject: PermGen Space Error Dear All, I'm using - = Spark 1.2.0 = Hive 0.13.1 = Mesos

Re: Functions in Spark SQL

2015-07-27 Thread fightf...@163.com
Hi, there I test with sqlContext.sql(select funcName(param1,param2,...) from tableName ) just worked fine. Would you like to paste your test code here ? And which version of Spark are u using ? Best, Sun. fightf...@163.com From: vinod kumar Date: 2015-07-27 15:04 To: User Subject

Re: Re: Need help in setting up spark cluster

2015-07-23 Thread fightf...@163.com
suggest you firstly to deploy a spark standalone cluster to run some integration tests, and also you can consider running spark on yarn for the later development use cases. Best, Sun. fightf...@163.com From: Jeetendra Gangele Date: 2015-07-23 13:39 To: user Subject: Re: Need help in setting

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-05-12 Thread fightf...@163.com
Hi, there Which version are you using ? Actually the problem seems gone after we change our spark version from 1.2.0 to 1.3.0 Not sure what the internal changes did. Best, Sun. fightf...@163.com From: Night Wolf Date: 2015-05-12 22:05 To: fightf...@163.com CC: Patrick Wendell; user; dev

Re: Cannot run the example in the Spark 1.3.0 following the document

2015-04-02 Thread fightf...@163.com
Hi, there you may need to add : import sqlContext.implicits._ Best, Sun fightf...@163.com From: java8964 Date: 2015-04-03 10:15 To: user@spark.apache.org Subject: Cannot run the example in the Spark 1.3.0 following the document I tried to check out what Spark SQL 1.3.0. I installed

Re: Re: rdd.cache() not working ?

2015-04-01 Thread fightf...@163.com
Hi Still no good luck with your guide. Best. Sun. fightf...@163.com From: Yuri Makhno Date: 2015-04-01 15:26 To: fightf...@163.com CC: Taotao.Li; user Subject: Re: Re: rdd.cache() not working ? cache() method returns new RDD so you have to use something like this: val person

Re: Re: rdd.cache() not working ?

2015-04-01 Thread fightf...@163.com
Hi That is just the issue. After running person.cache we then run person.count however, there still not be any cache performance showed from web ui storage. Thanks, Sun. fightf...@163.com From: Taotao.Li Date: 2015-04-01 14:02 To: fightfate CC: user Subject: Re: rdd.cache() not working

Re: Re: rdd.cache() not working ?

2015-04-01 Thread fightf...@163.com
sqlContext.cacheTable operation, we can see the cache results. Not sure what's happening here. If anyone can reproduce this issue, please let me know. Thanks, Sun fightf...@163.com From: Sean Owen Date: 2015-04-01 15:54 To: Yuri Makhno CC: fightf...@163.com; Taotao.Li; user Subject: Re: Re

rdd.cache() not working ?

2015-03-31 Thread fightf...@163.com
this for a little. Best, Sun. case class Person(id: Int, col1: String) val person = sc.textFile(hdfs://namenode_host:8020/user/person.txt).map(_.split(,)).map(p = Person(p(0).trim.toInt, p(1))) person.cache person.count fightf...@163.com

Re: RE: Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Thanks, Jerry I got that way. Just to make sure whether there can be some option to directly specifying tachyon version. fightf...@163.com From: Shao, Saisai Date: 2015-03-16 11:10 To: fightf...@163.com CC: user Subject: RE: Building spark over specified tachyon I think you could change

Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
. fightf...@163.com

Re: Re: Building spark over specified tachyon

2015-03-15 Thread fightf...@163.com
Thanks haoyuan. fightf...@163.com From: Haoyuan Li Date: 2015-03-16 12:59 To: fightf...@163.com CC: Shao, Saisai; user Subject: Re: RE: Building spark over specified tachyon Here is a patch: https://github.com/apache/spark/pull/4867 On Sun, Mar 15, 2015 at 8:46 PM, fightf...@163.com fightf

Re: deploying Spark on standalone cluster

2015-03-14 Thread fightf...@163.com
Hi, You may want to check your spark environment config in spark-env.sh, specifically for the SPARK_LOCAL_IP and check that whether you did modify that value, which may default be localhost. Thanks, Sun. fightf...@163.com From: sara mustafa Date: 2015-03-14 15:13 To: user Subject: deploying

Re: deploying Spark on standalone cluster

2015-03-14 Thread fightf...@163.com
Hi, You may want to check your spark environment config in spark-env.sh, specifically for the SPARK_LOCAL_IP and check that whether you did modify that value, which may default be localhost. Thanks, Sun. fightf...@163.com From: sara mustafa Date: 2015-03-14 15:13 To: user Subject

Re: Problem connecting to HBase

2015-03-13 Thread fightf...@163.com
Hi, there You may want to check your hbase config. e.g. the following property can be changed to /hbase property namezookeeper.znode.parent/name value/hbase-unsecure/value /property fightf...@163.com From: HARIPRIYA AYYALASOMAYAJULA Date: 2015-03-14 10:47 To: user

Re: Spark code development practice

2015-03-05 Thread fightf...@163.com
Hi, You can first establish a scala ide to develop and debug your spark program, lets say, intellij idea or eclipse. Thanks, Sun. fightf...@163.com From: Xi Shen Date: 2015-03-06 09:19 To: user@spark.apache.org Subject: Spark code development practice Hi, I am new to Spark. I see every

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread fightf...@163.com
application? Does spark provide such configs for achieving that goal? We know that this is trickle to get it working. Just want to know that how could this be resolved, or from other possible channel for we did not cover. Expecting for your kind advice. Thanks, Sun. fightf...@163.com

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-11 Thread fightf...@163.com
Hi, Really have no adequate solution got for this issue. Expecting any available analytical rules or hints. Thanks, Sun. fightf...@163.com From: fightf...@163.com Date: 2015-02-09 11:56 To: user; dev Subject: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data

pamameter passed for AppendOnlyMap initialCapacity

2015-02-09 Thread fightf...@163.com
for supporting modifying this ? Very thanks, fightf...@163.com

Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-08 Thread fightf...@163.com
Hi, Problem still exists. Any experts would take a look at this? Thanks, Sun. fightf...@163.com From: fightf...@163.com Date: 2015-02-06 17:54 To: user; dev Subject: Sort Shuffle performance issues about using AppendOnlyMap for large data sets Hi, all Recently we had caught performance

Re: Hi: hadoop 2.5 for spark

2015-01-30 Thread fightf...@163.com
Hi, Siddharth You can re build spark with maven by specifying -Dhadoop.version=2.5.0 Thanks, Sun. fightf...@163.com From: Siddharth Ubale Date: 2015-01-30 15:50 To: user@spark.apache.org Subject: Hi: hadoop 2.5 for spark Hi , I am beginner with Apache spark. Can anyone let me know

Re: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread fightf...@163.com
) List(kv) } Thanks, Sun fightf...@163.com From: Jim Green Date: 2015-01-28 04:44 To: Ted Yu CC: user Subject: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile I used below code, and it still failed with the same error. Anyone has experience on bulk loading