question about spark streaming

2015-08-10 Thread sequoiadb
hi guys, i have a question about spark streaming. There’s an application keep sending transaction records into spark stream with about 50k tps The record represents a sales information including customer id / product id / time / price columns The application is required to monitor the change

question about sparksql caching

2015-05-14 Thread sequoiadb
Hi all, We are planing to use SparkSQL in a DW system. There’s a question about the caching mechanism of SparkSQL. For example, if I have a SQL like sqlContext.sql(“select c1, sum(c2) from T1, T2 where T1.key=T2.key group by c1”).cache() Is it going to cache the final result or the raw data

question about sparksql caching

2015-05-14 Thread sequoiadb
Hi all, We are planing to use SparkSQL in a DW system. There’s a question about the caching mechanism of SparkSQL. For example, if I have a SQL like sqlContext.sql(“select c1, sum(c2) from T1, T2 where T1.key=T2.key group by c1”).cache() Is it going to cache the final result or the raw data

what is the best way to transfer data from RDBMS to spark?

2015-04-24 Thread sequoiadb
If I run spark in stand-alone mode ( not YARN mode ), is there any tool like Sqoop that able to transfer data from RDBMS to spark storage? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

how to specify multiple masters in sbin/start-slaves.sh script?

2015-03-19 Thread sequoiadb
Hey guys, Not sure if i’m the only one got this. We are building high-available standalone spark env. We are using ZK with 3 masters in the cluster. However, in sbin/start-slaves.sh, it calls start-slave.sh for each member in conf/slaves file, and specify master using $SPARK_MASTER_IP and

sparksql native jdbc driver

2015-03-18 Thread sequoiadb
hey guys, In my understanding SparkSQL only supports JDBC connection through hive thrift server, is this correct? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail:

building all modules in spark by mvn

2015-03-13 Thread sequoiadb
guys, is there any easier way to build all modules by mvn ? right now if I run “mvn package” in spark root directory I got: [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 8.327 s] [INFO] Spark Project Networking ...

Unable to stop Worker in standalone mode by sbin/stop-all.sh

2015-03-12 Thread sequoiadb
/data/sequoiadb-driver-1.10.jar,/data/spark-sequoiadb-0.0.1-SNAPSHOT.jar::/data/spark/conf:/data/spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.4.0.jar -XX:MaxPermSize=128m -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=centos-151:2181,centos-152:2181

Re: Unable to stop Worker in standalone mode by sbin/stop-all.sh

2015-03-12 Thread sequoiadb
job that periodically cleans up /tmp dir ? Cheers On Thu, Mar 12, 2015 at 6:18 PM, sequoiadb mailing-list-r...@sequoiadb.com mailto:mailing-list-r...@sequoiadb.com wrote: Checking the script, it seems spark-daemon.sh unable to stop the worker $ ./spark-daemon.sh stop