Error when cache partitioned Parquet table

2015-01-26 Thread ZHENG, Xu-dong
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- 郑旭东 ZHENG, Xu-dong

Re: spark sql - save to Parquet file - Unsupported datatype TimestampType

2014-12-08 Thread ZHENG, Xu-dong
archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- 郑旭东 ZHENG, Xu-dong

Re: Is there any way to control the parallelism in LogisticRegression

2014-08-21 Thread ZHENG, Xu-dong
RDD.repartition(). For coalesce without shuffle, I don't know how to set the right number of partitions either ... -Xiangrui On Tue, Aug 12, 2014 at 6:16 AM, ZHENG, Xu-dong dong...@gmail.com wrote: Hi Xiangrui, Thanks for your reply! Yes, our data is very sparse, but RDD.repartition invoke

Re: Is there any way to control the parallelism in LogisticRegression

2014-08-12 Thread ZHENG, Xu-dong
On Mon, Aug 11, 2014 at 10:39 PM, ZHENG, Xu-dong dong...@gmail.com wrote: I think this has the same effect and issue with #1, right? On Tue, Aug 12, 2014 at 1:08 PM, Jiusheng Chen chenjiush...@gmail.com wrote: How about increase HDFS file extent size? like current value is 128M, we

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong
(SparkSubmit.scala:73) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- 郑旭东 ZHENG, Xu-dong

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong
PM, ZHENG, Xu-dong dong...@gmail.com wrote: Hi Cheng, I also meet some issues when I try to start ThriftServer based a build from master branch (I could successfully run it from the branch-1.0-jdbc branch). Below is my build command: ./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive

Is there any way to control the parallelism in LogisticRegression

2014-08-11 Thread ZHENG, Xu-dong
a lot of 'ANY' tasks, that means that tasks read data from other nodes, and become slower than that read data from local memory. I think the best way should like #3, but leverage locality as more as possible. Is there any way to do that? Any suggestions? Thanks! -- ZHENG, Xu-dong

Re: Is there any way to control the parallelism in LogisticRegression

2014-08-11 Thread ZHENG, Xu-dong
I think this has the same effect and issue with #1, right? On Tue, Aug 12, 2014 at 1:08 PM, Jiusheng Chen chenjiush...@gmail.com wrote: How about increase HDFS file extent size? like current value is 128M, we make it 512M or bigger. On Tue, Aug 12, 2014 at 11:46 AM, ZHENG, Xu-dong dong