Re: Will Pig support SQL?

Dmitriy Ryaboy Mon, 08 Feb 2010 07:17:29 -0800

Jian,
If what you are looking for is something that will let you deal with
skewed data and forget about how the underlying distributed system
works, both Pig and Hive will help you do that to some extent. If you
are looking for something that will let you exercise fine-grained
control over individual scheduling of tasks, which is what this sounds
like, neither project is for you -- in fact, this is more or less the
opposite of what they are trying to do, which is to take away the
complexities of partitioning large data sets, scheduling tasks, and
orchestrating data flows.


If you are looking to tweak the hadoop internals to schedule things
differently, you may find that the pluggable scheduler interface is
useful. If you manage to achieve your goals by constructing a new
scheduler, Pig and Hive will both continue working as higher-level
abstractions, as long as you adhere to the provided interface for task
scheduling.


On Mon, Feb 8, 2010 at 2:05 AM, jian yi <[email protected]> wrote:
> We can regards a task as a sleep call, the parameter of sleep is the time
> long.
> sleep(N) - For hive ,the N is not certain
> sleep(M) - For MBR, the M is certain
>
> 2010/2/8 jian yi <[email protected]>
>
>> Hi Jeff,
>>
>> Thank you Jeff.
>> I known Hive has handling skewed join, but I think it is not enough:
>> 1.Need cost sample
>> 2.Can't control the size of a task
>> 3.Not exact
>> 4.Must use Hive or Pig
>>
>> I think this is a fundamental solution for skew problem by adding balacne
>> between map and reduce. Maybe I need express it more detailed.
>>
>> Regards
>> Jian YI
>>
>> 2010/2/8 Jeff Hammerbacher <[email protected]>
>>
>> Hey Jian,
>>>
>>> Hive supports arbitrary procedural languages through Hadoop Streaming; see
>>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.
>>>
>>> Also, both Hive and Pig have support for handling skewed joins if you use
>>> their higher-level interface. See
>>> https://issues.apache.org/jira/browse/HIVE-562 and
>>> http://wiki.apache.org/pig/PigSkewedJoinSpec.
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On Sun, Feb 7, 2010 at 4:13 AM, jian yi <[email protected]> wrote:
>>>
>>> > Hey Jeff,
>>> >
>>> > Thank you, Jeff.
>>> > The procedure means procedure language, like Oracle PL/SQL, which is
>>> very
>>> > helpful to migrate old services. We want to build a data warehouse based
>>> on
>>> > MapReduce engine. I plan to optimize MapReduce to solve the skew problem
>>> by
>>> > adding a balance between map and reduce. Please refer to
>>> > http://bbs.hadoopor.com/thread-521-1-1.html
>>> >
>>> > <http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
>>> > Jian
>>> >
>>> > 2010/2/7 Jeff Hammerbacher <[email protected]>
>>> >
>>> > > Hey Jian,
>>> > >
>>> > > I'm not sure what you mean by "Hive don't support procedure", but in
>>> any
>>> > > case, the Pig team has stated that they will support SQL over the Pig
>>> > > execution engine. See https://issues.apache.org/jira/browse/PIG-824.
>>> > >
>>> > > Regards,
>>> > > Jeff
>>> > >
>>> > > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <[email protected]> wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > SQL is very helpful to develop data warehouse, but Hive don't
>>> support
>>> > > > procedure. if Pig support SQL, it will be more powerful.
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Will Pig support SQL?

Reply via email to