[ANNOUNCE] Apache Kyuubi released 1.7.3

2023-09-25 Thread Zhen Wang
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.7.3 has been released! Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for

[ANNOUNCE] Apache Kyuubi released 1.7.2

2023-09-18 Thread Zhen Wang
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.7.2 has been released! Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for

read compressed hdfs files using SparkContext.textFile?

2015-09-08 Thread shenyan zhen
Hi, For hdfs files written with below code: rdd.saveAsTextFile(getHdfsPath(...), classOf [org.apache.hadoop.io.compress.GzipCodec]) I can see the hdfs files been generated: 0 /lz/streaming/am/144173460/_SUCCESS 1.6 M /lz/streaming/am/144173460/part-0.gz 1.6 M

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread shenyan zhen
/wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available > > On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > >> The folder is in "/tmp" by default. Could you use "df -h" to check the >> free space of /tmp? >> >> Best Regards, >> Shixiong Zhu

SparkContext initialization error- java.io.IOException: No space left on device

2015-09-04 Thread shenyan zhen
Has anyone seen this error? Not sure which dir the program was trying to write to. I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client mode. 15/09/04 21:36:06 ERROR SparkContext: Error adding jar (java.io.IOException: No space left on device), was the --addJars option used?

Re: Fighting against performance: JDBC RDD badly distributed

2015-07-28 Thread shenyan zhen
Hi Saif, Are you using JdbcRDD directly from Spark? If yes, then the poor distribution could be due to the bound key you used. See the JdbcRDD Scala doc at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.JdbcRDD : sql the text of the query. The query must contain

Re: Fighting against performance: JDBC RDD badly distributed

2015-07-28 Thread shenyan zhen
your objective and show some code snippet? Shenyan On Tue, Jul 28, 2015 at 3:23 PM, saif.a.ell...@wellsfargo.com wrote: Thank you for your response Zhen, I am using some vendor specific JDBC driver JAR file (honestly I dont know where it came from). It’s api is NOT like JdbcRDD, instead

Re: Meets class not found error in spark console with newly hive context

2015-07-02 Thread shenyan zhen
In case it helps: I got around it temporarily by saving and reseting the context class loader around creating HiveContext. On Jul 2, 2015 4:36 AM, Terry Hole hujie.ea...@gmail.com wrote: Found this a bug in spark 1.4.0: SPARK-8368 https://issues.apache.org/jira/browse/SPARK-8368 Thanks!

Re: Spark Cluster Benchmarking Frameworks

2015-06-03 Thread Zhen Jia
Hi Jonathan, Maybe you can try BigDataBench. http://prof.ict.ac.cn/BigDataBench/ http://prof.ict.ac.cn/BigDataBench/ . It provides lots of workloads, including both Hadoop and Spark based workloads. Zhen Jia hodgesz wrote Hi Spark Experts, I am curious what people are using to benchmark

Re: Driver hangs on running mllib word2vec

2015-01-05 Thread Eric Zhen
depends on the vocabSize. Even without overflow, there are still other bottlenecks, for example, syn0Global and syn1Global, each of them has vocabSize * vectorSize elements. Thanks. Zhan Zhang On Jan 5, 2015, at 7:47 PM, Eric Zhen zhpeng...@gmail.com wrote: Hi Xiangrui, Our dataset is about

Re: Driver hangs on running mllib word2vec

2015-01-05 Thread Eric Zhen
is the vocabulary size? -Xiangrui On Sun, Jan 4, 2015 at 11:18 PM, Eric Zhen zhpeng...@gmail.com wrote: Hi, When we run mllib word2vec(spark-1.1.0), driver get stuck with 100% cup usage. Here is the jstack output: main prio=10 tid=0x40112800 nid=0x46f2 runnable [0x4162e000

Driver hangs on running mllib word2vec

2015-01-04 Thread Eric Zhen
Hi, When we run mllib word2vec(spark-1.1.0), driver get stuck with 100% cup usage. Here is the jstack output: main prio=10 tid=0x40112800 nid=0x46f2 runnable [0x4162e000] java.lang.Thread.State: RUNNABLE at

Re: SparkSQL exception on spark.sql.codegen

2014-11-18 Thread Eric Zhen
won't have the resources to investigate backporting a fix. However, if you can reproduce the problem in Spark 1.2 then please file a JIRA. On Mon, Nov 17, 2014 at 9:37 PM, Eric Zhen zhpeng...@gmail.com wrote: Yes, it's always appears on a part of the whole tasks in a stage(i.e. 100/100 (65

Re: SparkSQL exception on spark.sql.codegen

2014-11-17 Thread Eric Zhen
Hi Michael, We use Spark v1.1.1-rc1 with jdk 1.7.0_51 and scala 2.10.4. On Tue, Nov 18, 2014 at 7:09 AM, Michael Armbrust mich...@databricks.com wrote: What version of Spark SQL? On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng...@gmail.com wrote: Hi all, We run SparkSQL on TPCDS

Re: SparkSQL exception on spark.sql.codegen

2014-11-17 Thread Eric Zhen
at 7:04 PM, Eric Zhen zhpeng...@gmail.com wrote: Hi Michael, We use Spark v1.1.1-rc1 with jdk 1.7.0_51 and scala 2.10.4. On Tue, Nov 18, 2014 at 7:09 AM, Michael Armbrust mich...@databricks.com wrote: What version of Spark SQL? On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng

SparkSQL exception on spark.sql.codegen

2014-11-15 Thread Eric Zhen
Hi all, We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we got exceptions as below, has anyone else saw these before? java.lang.ExceptionInInitializerError at org.apache.spark.sql.execution.SparkPlan.newProjection(SparkPlan.scala:92) at

spark master UI does not keep detailed application history

2014-06-13 Thread zhen
I have been trying to get detailed history of previous spark shell executions (after exiting spark shell). In standalone mode and Spark 1.0, I think the spark master UI is supposed to provide detailed execution statistics of all previously run jobs. This is supposed to be viewable by clicking on

Re: multiple passes in mapPartitions

2014-06-13 Thread zhen
Thank you for your suggestion. We will try it out and see how it performs. We think the single call to mapPartitions will be faster but we could be wrong. It would be nice to have a clone method on the iterator. -- View this message in context:

multiple passes in mapPartitions

2014-06-12 Thread zhen
the memory. Which is also bad in terms of more GC. Is there a faster/better way of taking multiple passes without copying all the data? Thank you, Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/multiple-passes-in-mapPartitions-tp7555.html Sent from

Re: problem starting the history server on EC2

2014-06-11 Thread zhen
the started the history server like the following. ./start-history-server.sh hdfs:///spark_logs --port 18080 In order to see the history server UI I needed to open up inbound traffic for the port 18080 in AWS. As follows custom TCP port 18080 from anywhere Hope this will help others. Zhen -- View

problem starting the history server on EC2

2014-06-10 Thread zhen
not exist. But I have definitely created the directory and made sure everyone can read/write/execute in the directory. Can you tell me why it does not work? Thank you Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-starting-the-history-server

Re: problem starting the history server on EC2

2014-06-10 Thread zhen
root root 4096 Jun 11 02:08 tmp drwxrwxrwx 2 root root 4096 Jun 11 02:08 spark_log Thanks Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-starting-the-history-server-on-EC2-tp7361p7370.html Sent from the Apache Spark User List mailing list

Re: A new resource for getting examples of Spark RDD API calls

2014-05-21 Thread zhen
Great, thanks for that tip. I will update the documents! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-new-resource-for-getting-examples-of-Spark-RDD-API-calls-tp5529p6210.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

A new resource for getting examples of Spark RDD API calls

2014-05-15 Thread zhen
into it. Hope you find it useful. Zhen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-new-resource-for-getting-examples-of-Spark-RDD-API-calls-tp5529.html Sent from the Apache Spark User List mailing list archive at Nabble.com.