date:20141018

Re: How to disable input split

2014-10-18 Thread Davies Liu

You can call coalesce() to merge the small splits into bigger ones. Davies On Fri, Oct 17, 2014 at 5:35 PM, Larry Liu larryli...@gmail.com wrote: Is it possible to disable input split if input is already small? - To

Re: input split size

2014-10-18 Thread Ilya Ganelin

Also - if you're doing a text file read you can pass the number of resulting partitions as the second argument. On Oct 17, 2014 9:05 PM, Larry Liu larryli...@gmail.com wrote: Thanks, Andrew. What about reading out of local? On Fri, Oct 17, 2014 at 5:38 PM, Andrew Ash and...@andrewash.com

a hivectx insertinto issue-can inertinto function be applied to a hive table

2014-10-18 Thread valgrind_girl

The complete code is as follows: JavaHiveContext ctx; JavaSchemaRDD schemas=ctx.jsonRDD(arg0); schemas.insertInto(test, true); JavaSchemaRDD teeagers=ctx.hql(SELECT a,b FROM test); ListString teeagerNames1=teeagers.map(new FunctionRow,String()

Re: Spark/HIVE Insert Into values Error

2014-10-18 Thread Cheng Lian

Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT INTO ... VALUES ... syntax. On 10/18/14 1:33 AM, arthur.hk.c...@gmail.com wrote: Hi, When trying to insert records into HIVE, I got error, My Spark is 1.1.0 and Hive 0.12.0 Any idea what would be wrong? Regards Arthur

Re: Unable to connect to Spark thrift JDBC server with pluggable authentication

2014-10-18 Thread Cheng Lian

Hi Jenny, how did you configure the classpath and start the Thrift server (YARN client/YARN cluster/standalone/...)? On 10/18/14 4:14 AM, Jenny Zhao wrote: Hi, if Spark thrift JDBC server is started with non-secure mode, it is working fine. with a secured mode in case of pluggable

Re: a hivectx insertinto issue-can inertinto function be applied to a hive table

2014-10-18 Thread Cheng Lian

In your JSON snippet, 111 and 222 are quoted, namely they are strings. Thus they are automatically inferred as string rather than tinyint by |jsonRDD|. Try this in Spark shell: |val sparkContext = sc import org.apache.spark.sql._ import sparkContext._ val sqlContext = new

Fwd: Oryx + Spark mllib

2014-10-18 Thread Debasish Das

Hi, Is someone working on a project on integrating Oryx model serving layer with Spark ? Models will be built using either Streaming data / Batch data in HDFS and cross validated with mllib APIs but the model serving layer will give API endpoints like Oryx and read the models may be from

Re: input split size

2014-10-18 Thread Mayur Rustagi

Does it retain the order if its pulling from the hdfs blocks, meaning if file1 = a, b, c partition in order if I convert to 2 partition read will it map to ab, c or a, bc or it can also be a, cb ? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Spark speed performance

2014-10-18 Thread jan.zikes

Hi, I have program that I have for single computer (in Python) exection and also implemented the same for Spark. This program basically only reads .json from which it takes one field and saves it back. Using Spark my program runs aproximately 100 times slower on 1 master and 1 slave. So I

What executes on worker and what executes on driver side

2014-10-18 Thread Saurabh Wadhawan

Hi, I have following questions: 1. When I write a spark script, how do I know what part runs on the driver side and what runs on the worker side. So lets say, I write code to to read a plain text file. Will it run on driver side only or will it run on server side only or on both sides

Re: Spark speed performance

2014-10-18 Thread Evan Sparks

How many files do you have and how big is each JSON object? Spark works better with a few big files vs many smaller ones. So you could try cat'ing your files together and rerunning the same experiment. - Evan On Oct 18, 2014, at 12:07 PM, jan.zi...@centrum.cz jan.zi...@centrum.cz wrote:

Re: Could Spark make use of Intel Xeon Phi?

2014-10-18 Thread Andrew Ash

Hi Lang, If the Linux kernel on those machines recognize all the cores then Spark will use them all naturally with no extra work. Are you seeing otherwise? Andrew On Oct 9, 2014 2:00 PM, Lang Yu lysubscr...@gmail.com wrote: Hi, Currently all the workloads are run on CPUs. Is it possible that

Re: Spark speed performance

2014-10-18 Thread Davies Liu

How many CPUs on the slave? Because the overhead between JVM and Python, single task will be slower than your local Python scripts, but it's very easy to scale to many CPUs. Even one CPUs, it's not common that PySpark was 100 times slower. You have many small files, each file will be processed

Re: mllib.linalg.Vectors vs Breeze?

2014-10-18 Thread Matei Zaharia

toBreeze is private within Spark, it should not be accessible to users. If you want to make a Breeze vector from an MLlib one, it's pretty straightforward, and you can make your own utility function for it. Matei On Oct 17, 2014, at 5:09 PM, Sean Owen so...@cloudera.com wrote: Yes, I

why fetch failed

2014-10-18 Thread marylucy

When doing groupby for big data,may be 500g,some partition tasks success,some partition tasks fetchfailed error. Spark system retry previous stage,but always fail 6 computers : 384g Worker:40g*7 for one computer Can anyone tell me why fetch failed???

Re: input split size

2014-10-18 Thread Aaron Davidson

The minPartitions argument of textFile/hadoopFile cannot decrease the number of splits past the physical number of blocks/files. So if you have 3 HDFS blocks, asking for 2 minPartitions will still give you 3 partitions (hence the min). It can, however, convert a file with fewer HDFS blocks into

Spark SQL on XML files

2014-10-18 Thread gtinside

Hi , I have bunch of Xml files and I want to run spark SQL on it, is there a recommended approach ? I am thinking of either converting Xml in json and then jsonRDD Please let me know your thoughts Regards, Gaurav -- View this message in context:

Re: mllib.linalg.Vectors vs Breeze?

2014-10-18 Thread Sean Owen

Oops yes it is. Been working inside the Spark packages too long. Ignore that comment. On Oct 19, 2014 1:42 AM, Matei Zaharia matei.zaha...@gmail.com wrote: toBreeze is private within Spark, it should not be accessible to users. If you want to make a Breeze vector from an MLlib one, it's pretty

Submissions open for Spark Summit East 2015

2014-10-18 Thread Matei Zaharia

After successful events in the past two years, the Spark Summit conference has expanded for 2015, offering both an event in New York on March 18-19 and one in San Francisco on June 15-17. The conference is a great chance to meet people from throughout the Spark community and see the latest

Re: input split size

2014-10-18 Thread Nicholas Chammas

Side note: I thought bzip2 was splittable. Perhaps you meant gzip? 2014년 10월 18일 토요일, Aaron Davidsonilike...@gmail.com님이 작성한 메시지: The minPartitions argument of textFile/hadoopFile cannot decrease the number of splits past the physical number of blocks/files. So if you have 3 HDFS blocks,

Re: How to disable input split

Re: input split size

a hivectx insertinto issue-can inertinto function be applied to a hive table

Re: Spark/HIVE Insert Into values Error

Re: Unable to connect to Spark thrift JDBC server with pluggable authentication

Re: a hivectx insertinto issue-can inertinto function be applied to a hive table

Fwd: Oryx + Spark mllib

Re: input split size

Spark speed performance

What executes on worker and what executes on driver side

Re: Spark speed performance

Re: Could Spark make use of Intel Xeon Phi?

Re: Spark speed performance

Re: mllib.linalg.Vectors vs Breeze?

why fetch failed

Re: input split size

Spark SQL on XML files

Re: mllib.linalg.Vectors vs Breeze?

Submissions open for Spark Summit East 2015

Re: input split size

20 matches

Site Navigation

Mail list logo

Footer information