Compilaon Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2 edit project/SparkBuild.scala, set HBASE_VERSION // HBase version; set as appropriate. val H

Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
(correction: "Compilation Error: Spark 1.0.2 with HBase 0.98” , please ignore if duplicated) Hi, I need to use Spark with HBase 0.98 and tried to compile Spark 1.0.2 with HBase 0.98, My steps: wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2.tgz tar -vxf spark-1.0.2.tgz cd spark-1.0.2

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
e/spark/pull/1893 > > > On Wed, Aug 27, 2014 at 6:57 PM, arthur.hk.c...@gmail.com > wrote: > (correction: "Compilation Error: Spark 1.0.2 with HBase 0.98” , please > ignore if duplicated) > > > Hi, > > I need to use Spark with HBase 0.98 and tried

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
gt; > BTW 0.98.5 has been released - you can specify 0.98.5-hadoop2 in the pom.xml > > Cheers > > > On Wed, Aug 27, 2014 at 7:18 PM, arthur.hk.c...@gmail.com > wrote: > Hi Ted, > > Thank you so much!! > > As I am new to Spark, can you please advise the ste

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-27 Thread arthur.hk.c...@gmail.com
(offset -40 lines). Hunk #2 succeeded at 195 (offset -40 lines). On 28 Aug, 2014, at 10:53 am, Ted Yu wrote: > Can you use this command ? > > patch -p1 -i 1893.patch > > Cheers > > > On Wed, Aug 27, 2014 at 7:41 PM, arthur.hk.c...@gmail.com > wrote: > Hi Ted, &g

Compilation FAILURE : Spark 1.0.2 / Project Hive (0.13.1)

2014-08-27 Thread arthur.hk.c...@gmail.com
Hi, I use Hadoop 2.4.1, HBase 0.98.5, Zookeeper 3.4.6 and Hive 0.13.1. I just tried to compile Spark 1.0.2, but got error on "Spark Project Hive", can you please advise which repository has "org.spark-project.hive:hive-metastore:jar:0.13.1"? FYI, below is my repository setting in maven which

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-28 Thread arthur.hk.c...@gmail.com
geCheers On Wed, Aug 27, 2014 at 7:57 PM, arthur.hk.c...@gmail.com <arthur.hk.c...@gmail.com> wrote: Hi Ted, Thanks. Tried [patch -p1 -i 1893.patch]    (Hunk #1 FAILED at 45.) Is this normal?RegardsArthurpatch -p1 -i 1893.patch patching file examples/pom.xmlHunk #1 FAILED at 45. Hunk #2 succeeded a

Re: Compilation Error: Spark 1.0.2 with HBase 0.98

2014-08-28 Thread arthur.hk.c...@gmail.com
pm, Ted Yu wrote: > I see 0.98.5 in dep.txt > > You should be good to go. > > > On Thu, Aug 28, 2014 at 3:16 AM, arthur.hk.c...@gmail.com > wrote: > Hi, > > tried > mvn -Phbase-hadoop2,hadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests > dependency:tree

SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I have just tried to apply the patch of SPARK-1297: https://issues.apache.org/jira/browse/SPARK-1297 There are two files in it, named spark-1297-v2.txt and spark-1297-v4.txt respectively. When applying the 2nd one, I got "Hunk #1 FAILED at 45" Can you please advise how to fix it in order

Re: SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread arthur.hk.c...@gmail.com
docs/building-with-maven.md |+++ docs/building-with-maven.md -- File to patch: Please advise Regards Arthur On 29 Aug, 2014, at 12:50 am, arthur.hk.c...@gmail.com wrote: > Hi, > > I have just tried to apply the patch of SPARK-1297: > https://issues.apach

Re: SPARK-1297 patch error (spark-1297-v4.txt )

2014-08-28 Thread arthur.hk.c...@gmail.com
-Dhbase.profile=hadoop2 -Phadoop-2.4,yarn -Dhadoop.version=2.4.1 > -DskipTests clean package > > Patch v5 is @ level 0 - you don't need to use -p1 in the patch command. > > Cheers > > > On Thu, Aug 28, 2014 at 9:50 AM, arthur.hk.c...@gmail.com > wrote: >

org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and HBase. With default setting in Spark 1.0.2, when trying to load a file I got "Class org.apache.hadoop.io.compress.SnappyCodec not found" Can you please advise how to enable snappy in Spark? Regards Arthur scala> i

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
park? Regards Arthur On 29 Aug, 2014, at 2:39 am, arthur.hk.c...@gmail.com wrote: > Hi, > > I use Hadoop 2.4.1 and HBase 0.98.5 with snappy enabled in both Hadoop and > HBase. > With default setting in Spark 1.0.2, when trying to load a file I got "Class > org.apache.had

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
tCodecClasses(CompressionCodecFactory.java:128) ... 62 more Any idea to fix this issue? Regards Arthur On 29 Aug, 2014, at 2:58 am, arthur.hk.c...@gmail.com wrote: > Hi, > > my check native result: > > hadoop checknative > 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to

Re: org.apache.hadoop.io.compress.SnappyCodec not found

2014-08-28 Thread arthur.hk.c...@gmail.com
Hi, I fixed the issue by copying libsnappy.so to Java ire. Regards Arthur On 29 Aug, 2014, at 8:12 am, arthur.hk.c...@gmail.com wrote: > Hi, > > If change my etc/hadoop/core-site.xml > > from > > io.compression.codecs > > org.apache.had

Spark Hive max key length is 767 bytes

2014-08-28 Thread arthur.hk.c...@gmail.com
(Please ignore if duplicated) Hi, I use Spark 1.0.2 with Hive 0.13.1 I have already set the hive mysql database to latine1; mysql: alter database hive character set latin1; Spark: scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> hiveContext.hql("create table tes

Re: Spark Hive max key length is 767 bytes

2014-08-29 Thread arthur.hk.c...@gmail.com
ngth is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Can anyone please help? Regards Arthur On 29 Aug, 2014, at 12:47 pm, arthur.hk.c...@gmail.c

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread arthur.hk.c...@gmail.com
12 and 13. > > > On Fri, Aug 29, 2014 at 4:38 AM, arthur.hk.c...@gmail.com > wrote: > Hi, > > > Tried the same thing in HIVE directly without issue: > > HIVE: > hive> create table test_datatype2 (testbigint bigint ); > OK > Time taken: 0.708 seconds &

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread arthur.hk.c...@gmail.com
am, Denny Lee wrote: > Oh, you may be running into an issue with your MySQL setup actually, try > running > > alter database metastore_db character set latin1 > > so that way Hive (and the Spark HiveContext) can execute properly against the > metastore. > > > On Aug

Spark Master/Slave and HA

2014-08-30 Thread arthur.hk.c...@gmail.com
Hi, I have few questions about Spark Master and Slave setup: Here, I have 5 Hadoop nodes (n1, n2, n3, n4, and n5 respectively), at the moment I run Spark under these nodes: n1:Hadoop Active Name node, Hadoop Slave Spark Active Master

Spark and Shark Node: RAM Allocation

2014-08-30 Thread arthur.hk.c...@gmail.com
Hi, Is there any formula to calculate proper RAM allocation values for Spark and Shark based on Physical RAM, HADOOP and HBASE RAM usage? e.g. if a node has 32GB physical RAM spark-defaults.conf spark.executor.memory ?g spark-env.sh export SPARK_WORKER_MEMORY=? export HADOOP_HEAP

Spark and Shark

2014-09-01 Thread arthur.hk.c...@gmail.com
Hi, I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling from source). spark: 1.0.2 shark: 0.9.2 hadoop: 2.4.1 java: java version “1.7.0_67” protobuf: 2.5.0 I have tried the smoke test in shark but got “java.util.NoSuchElementException” error, can you please advise how

Re: Dependency Problem with Spark / ScalaTest / SBT

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi, What is your SBT command and the parameters? Arthur On 10 Sep, 2014, at 6:46 pm, Thorsten Bergler wrote: > Hello, > > I am writing a Spark App which is already working so far. > Now I started to build also some UnitTests, but I am running into some > dependecy problems and I cannot fin

Re: Spark SQL -- more than two tables for join

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi, May be you can take a look about the following. http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html Good luck. Arthur On 10 Sep, 2014, at 9:09 pm, arunshell87 wrote: > > Hi, > > I too had tried SQL queries with joins, MINUS , subqueries etc bu

Re: Spark SQL -- more than two tables for join

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi Some findings: 1) spark sql does not support multiple join 2) spark left join: has performance issue 3) spark sql’s cache table: does not support two-tier query 4) spark sql does not support repartition Arthur On 10 Sep, 2014, at 10:22 pm, arthur.hk.c...@gmail.com wrote: >

unable to create new native thread

2014-09-11 Thread arthur.hk.c...@gmail.com
Hi I am trying the Spark sample program “SparkPi”, I got an error "unable to create new native thread", how to resolve this? 14/09/11 21:36:16 INFO scheduler.DAGScheduler: Completed ResultTask(0, 644) 14/09/11 21:36:16 INFO scheduler.TaskSetManager: Finished TID 643 in 43 ms on node1 (progress

object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, I have tried to to run HBaseTest.scala, but I got following errors, any ideas to how to fix them? Q1) scala> package org.apache.spark.examples :1: error: illegal start of definition package org.apache.spark.examples Q2) scala> import org.apache.hadoop.hbase.mapreduce.TableInputF

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
On Sun, Sep 14, 2014 at 7:36 AM, arthur.hk.c...@gmail.com <arthur.hk.c...@gmail.com> wrote:Hi, I have tried to to run HBaseTest.scala, but I  got following errors, any ideas to how to fix them?Q1) scala> package org.apache.spark.examples:1: error: illegal start of definition       package or

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
, Ted Yu wrote: > spark-1297-v5.txt is level 0 patch > > Please use spark-1297-v5.txt > > Cheers > > On Sun, Sep 14, 2014 at 8:06 AM, arthur.hk.c...@gmail.com > wrote: > Hi, > > Thanks!! > > I tried to apply the patches, both spark-1297-v2.txt and s

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
Hi, My bad. Tried again, worked. patch -p0 -i spark-1297-v5.txt patching file docs/building-with-maven.md patching file examples/pom.xml Thanks! Arthur On 14 Sep, 2014, at 11:38 pm, arthur.hk.c...@gmail.com wrote: > Hi, > > Thanks! > > patch -p0 -i spark-1297-v5.txt

Re: object hbase is not a member of package org.apache.hadoop

2014-09-14 Thread arthur.hk.c...@gmail.com
I applied the patch on master branch without rejects. > > If you use spark 1.0.2, use pom.xml attached to the JIRA. > > On Sun, Sep 14, 2014 at 8:38 AM, arthur.hk.c...@gmail.com > wrote: > Hi, > > Thanks! > > patch -p0 -i spark-1297-v5.txt > patching file docs/buildi

Re: Spark Hive max key length is 767 bytes

2014-09-25 Thread arthur.hk.c...@gmail.com
Hi, Fixed the issue by downgrade hive from 13.1 to 12.0, it works well now. Regards On 31 Aug, 2014, at 7:28 am, arthur.hk.c...@gmail.com wrote: > Hi, > > Already done but still get the same error: > > (I use HIVE 0.13.1 Spark 1.0.2, Hadoop 2.4.1) > > S

Re: SparkSQL on Hive error

2014-10-03 Thread arthur.hk.c...@gmail.com
hi, I have just tested the same command, it works here, can you please provide your create table command? regards Arthur scala> hiveContext.hql("show tables") warning: there were 1 deprecation warning(s); re-run with -deprecation for details 2014-10-03 17:14:33,575 INFO [main] parse.ParseDriv

How to save Spark log into file

2014-10-03 Thread arthur.hk.c...@gmail.com
Hi, How can the spark log be saved into file instead of showing them on console? Below is my conf/log4j.properties conf/log4j.properties ### # Root logger option log4j.rootLogger=INFO, file # Direct log messages to a log file log4j.appender.file=org.apache.log4j.RollingFileAppender #Redirect

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread arthur.hk.c...@gmail.com
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu wrote: > Great! Congratulations! > > -- > Nan Zhu > On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > >> Brilliant stuff ! Congrats all :-) >> This is indeed really heartening news ! >> >> Regards, >> Mridul >> >> >> On

How To Implement More Than One Subquery in Scala/Spark

2014-10-11 Thread arthur.hk.c...@gmail.com
Hi, My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1 subquery in my Spark SQL, below are my sample table structures and a SQL that contains more than 1 subquery. Question 1: How to load a HIVE table into Scala/Spark? Question 2: How to implement a SQL_WITH_MORE_THAN_O

Re: How To Implement More Than One Subquery in Scala/Spark

2014-10-13 Thread arthur.hk.c...@gmail.com
n is to run two separate map jobs and join their results. Keep in mind > that another useful technique is to execute the groupByKey routine , > particularly if you want to operate on a particular variable. > > On Oct 11, 2014 11:09 AM, "arthur.hk.c...@gmail.com" > wrote

Spark Hive Snappy Error

2014-10-16 Thread arthur.hk.c...@gmail.com
Hi, When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error, val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(1) from q8_national_market_share sqlContext.sql("select co

Spark/HIVE Insert Into values Error

2014-10-17 Thread arthur.hk.c...@gmail.com
Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur hive> CREATE TABLE students (name VARCHAR(64), age INT, gpa int); OK hive> INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1); NoViable

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
ys reproduce this issue, Is this issue related to some specific data > sets, would you mind giving me some information about you workload, Spark > configuration, JDK version and OS version? > > Thanks > Jerry > > From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, FYI, I use snappy-java-1.0.4.1.jar Regards Arthur On 22 Oct, 2014, at 8:59 pm, Shao, Saisai wrote: > Thanks a lot, I will try to reproduce this in my local settings and dig into > the details, thanks for your information. > > > BR > Jerry > > From:

ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, I just tried sample PI calculation on Spark Cluster, after returning the Pi result, it shows ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(m37,35662) not found ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://m33:7077 --e

Re: ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, I have managed to resolve it because a wrong setting. Please ignore this . Regards Arthur On 23 Oct, 2014, at 5:14 am, arthur.hk.c...@gmail.com wrote: > > 14/10/23 05:09:04 WARN ConnectionManager: All connections not cleaned up >

Spark: Order by Failed, java.lang.NullPointerException

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi, I got java.lang.NullPointerException. Please help! sqlContext.sql("select l_orderkey, l_linenumber, l_partkey, l_quantity, l_shipdate, L_RETURNFLAG, L_LINESTATUS from lineitem limit 10").collect().foreach(println); 2014-10-23 08:20:12,024 INFO [sparkDriver-akka.actor.default-dispatcher-3

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi,Please find the attached file.{\rtf1\ansi\ansicpg1252\cocoartf1265\cocoasubrtf210 {\fonttbl\f0\fnil\fcharset0 Menlo-Regular;} {\colortbl;\red255\green255\blue255;} \paperw11900\paperh16840\margl1440\margr1440\vieww26300\viewh12480\viewkind0 \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\t

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
Hi May I know where to configure Spark to load libhadoop.so? Regards Arthur On 23 Oct, 2014, at 11:31 am, arthur.hk.c...@gmail.com wrote: > Hi, > > Please find the attached file. > > > > > my spark-default.xml > # Default system properties included when runnin

Re: Spark Hive Snappy Error

2014-10-22 Thread arthur.hk.c...@gmail.com
hadoop-snappy-0.0.1-SNAPSHOT.jar" > > But for spark itself, it depends on snappy-0.2.jar. Is there any possibility > that this problem caused by different version of snappy? > > Thanks > Jerry > > From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] > S

Aggregation Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

2014-10-23 Thread arthur.hk.c...@gmail.com
Hi, I got $TreeNodeException, few questions: Q1) How should I do aggregation in SparK? Can I use aggregation directly in SQL? or Q1) Should I use SQL to load the data to form RDD then use scala to do the aggregation? Regards Arthur MySQL (good one, without aggregation): sqlContext.sql("SELEC

Re: Aggregation Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

2014-10-23 Thread arthur.hk.c...@gmail.com
item’; Regards Arthur On 23 Oct, 2014, at 9:36 pm, Yin Huai wrote: > Hello Arthur, > > You can use do aggregations in SQL. How did you create LINEITEM? > > Thanks, > > Yin > > On Thu, Oct 23, 2014 at 8:54 AM, arthur.hk.c...@gmail.com > wrote: > Hi, > &

Spark 1.1.0 and Hive 0.12.0 Compatibility Issue

2014-10-23 Thread arthur.hk.c...@gmail.com
(Please ignore if duplicated) Hi, My Spark is 1.1.0 and Hive is 0.12, I tried to run the same query in both Hive-0.12.0 then Spark-1.1.0, HiveQL works while SparkSQL failed. hive> select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, o_orderdate, o_shippriority from customer c

Re: Spark 1.1.0 and Hive 0.12.0 Compatibility Issue

2014-10-24 Thread arthur.hk.c...@gmail.com
4/10/25 06:50:15 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool Regards Arthur On 24 Oct, 2014, at 6:56 am, Michael Armbrust wrote: > Can you show the DDL for the table? It looks like the SerDe might be saying > it will produce a decimal type but i

Re: Spark: Order by Failed, java.lang.NullPointerException

2014-10-24 Thread arthur.hk.c...@gmail.com
Thanks > Best Regards > > On Thu, Oct 23, 2014 at 5:59 AM, arthur.hk.c...@gmail.com > wrote: > Hi, > > I got java.lang.NullPointerException. Please help! > > > sqlContext.sql("select l_orderkey, l_linenumber, l_partkey, l_quantity, > l_shipdate, L_RETURNFLAG

Re: Spark/HIVE Insert Into values Error

2014-10-25 Thread arthur.hk.c...@gmail.com
Hi, I have already found the way about how to “insert into HIVE_TABLE values (…..) Regards Arthur On 18 Oct, 2014, at 10:09 pm, Cheng Lian wrote: > Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT INTO > ... VALUES ... syntax. > > On 10/18/14 1:3

Spark 1.1.0 on Hive 0.13.1

2014-10-29 Thread arthur.hk.c...@gmail.com
Hi, My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please advise. Or, any news about when will Spark 1.1.0 on Hive 0.1.3.1 be available? Regards Arthur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.or

Re: Spark 1.1.0 on Hive 0.13.1

2014-10-29 Thread arthur.hk.c...@gmail.com
t; > On 10/29/14 7:43 PM, arthur.hk.c...@gmail.com wrote: >> Hi, >> >> My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please advise. >> >> Or, any news about when will Spark 1.1.0 on Hive

Re: OOM with groupBy + saveAsTextFile

2014-11-01 Thread arthur.hk.c...@gmail.com
Hi, FYI as follows. Could you post your heap size settings as well your Spark app code? Regards Arthur 3.1.3 Detail Message: Requested array size exceeds VM limit The detail message Requested array size exceeds VM limit indicates that the application (or APIs used by that application) attem