Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread James Yu
For performance, will foreign data format support, same as native ones?

Thanks,
James


On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian lian.cs@gmail.com wrote:

 The foreign data source API PR also matters here
 https://www.github.com/apache/spark/pull/2475

 Foreign data source like ORC can be added more easily and systematically
 after this PR is merged.

 On 10/9/14 8:22 AM, James Yu wrote:

 Thanks Mark! I will keep eye on it.

 @Evan, I saw people use both format, so I really want to have Spark
 support
 ORCFile.


 On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra m...@clearstorydata.com
 wrote:

  https://github.com/apache/spark/pull/2576



 On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan velvia.git...@gmail.com
 wrote:

  James,

 Michael at the meetup last night said there was some development
 activity around ORCFiles.

 I'm curious though, what are the pros and cons of ORCFiles vs Parquet?

 On Wed, Oct 8, 2014 at 10:03 AM, James Yu jym2...@gmail.com wrote:

 Didn't see anyone asked the question before, but I was wondering if

 anyone

 knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
 getting more and more popular hi Hive world.

 Thanks,
 James

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread Michael Armbrust
Yes, the foreign sources work is only about exposing a stable set of APIs
for external libraries to link against (to avoid the spark assembly
becoming a dependency mess).  The code path these APIs use will be the same
as that for datasources included in the core spark sql library.

Michael

On Thu, Oct 9, 2014 at 2:18 PM, James Yu jym2...@gmail.com wrote:

 For performance, will foreign data format support, same as native ones?

 Thanks,
 James


 On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian lian.cs@gmail.com wrote:

  The foreign data source API PR also matters here
  https://www.github.com/apache/spark/pull/2475
 
  Foreign data source like ORC can be added more easily and systematically
  after this PR is merged.
 
  On 10/9/14 8:22 AM, James Yu wrote:
 
  Thanks Mark! I will keep eye on it.
 
  @Evan, I saw people use both format, so I really want to have Spark
  support
  ORCFile.
 
 
  On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra m...@clearstorydata.com
  wrote:
 
   https://github.com/apache/spark/pull/2576
 
 
 
  On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan velvia.git...@gmail.com
  wrote:
 
   James,
 
  Michael at the meetup last night said there was some development
  activity around ORCFiles.
 
  I'm curious though, what are the pros and cons of ORCFiles vs Parquet?
 
  On Wed, Oct 8, 2014 at 10:03 AM, James Yu jym2...@gmail.com wrote:
 
  Didn't see anyone asked the question before, but I was wondering if
 
  anyone
 
  knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
  getting more and more popular hi Hive world.
 
  Thanks,
  James
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
 



Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread James Yu
Sounds great, thanks!



On Thu, Oct 9, 2014 at 2:22 PM, Michael Armbrust mich...@databricks.com
wrote:

 Yes, the foreign sources work is only about exposing a stable set of APIs
 for external libraries to link against (to avoid the spark assembly
 becoming a dependency mess).  The code path these APIs use will be the same
 as that for datasources included in the core spark sql library.

 Michael

 On Thu, Oct 9, 2014 at 2:18 PM, James Yu jym2...@gmail.com wrote:

 For performance, will foreign data format support, same as native ones?

 Thanks,
 James


 On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian lian.cs@gmail.com
 wrote:

  The foreign data source API PR also matters here
  https://www.github.com/apache/spark/pull/2475
 
  Foreign data source like ORC can be added more easily and systematically
  after this PR is merged.
 
  On 10/9/14 8:22 AM, James Yu wrote:
 
  Thanks Mark! I will keep eye on it.
 
  @Evan, I saw people use both format, so I really want to have Spark
  support
  ORCFile.
 
 
  On Wed, Oct 8, 2014 at 11:12 AM, Mark Hamstra m...@clearstorydata.com
 
  wrote:
 
   https://github.com/apache/spark/pull/2576
 
 
 
  On Wed, Oct 8, 2014 at 11:01 AM, Evan Chan velvia.git...@gmail.com
  wrote:
 
   James,
 
  Michael at the meetup last night said there was some development
  activity around ORCFiles.
 
  I'm curious though, what are the pros and cons of ORCFiles vs
 Parquet?
 
  On Wed, Oct 8, 2014 at 10:03 AM, James Yu jym2...@gmail.com wrote:
 
  Didn't see anyone asked the question before, but I was wondering if
 
  anyone
 
  knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
  getting more and more popular hi Hive world.
 
  Thanks,
  James
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
 





will/when Spark/SparkSQL will support ORCFile format

2014-10-08 Thread James Yu
Didn't see anyone asked the question before, but I was wondering if anyone
knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
getting more and more popular hi Hive world.

Thanks,
James


Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-08 Thread Evan Chan
James,

Michael at the meetup last night said there was some development
activity around ORCFiles.

I'm curious though, what are the pros and cons of ORCFiles vs Parquet?

On Wed, Oct 8, 2014 at 10:03 AM, James Yu jym2...@gmail.com wrote:
 Didn't see anyone asked the question before, but I was wondering if anyone
 knows if Spark/SparkSQL will support ORCFile format soon? ORCFile is
 getting more and more popular hi Hive world.

 Thanks,
 James

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org