Re: Thrift JDBC server - why only one per machine and only yarn-client

Takeshi Yamamuro Sat, 02 Jul 2016 00:50:12 -0700

This is probably because the current thrift-server implementation has
`SparkContext` inside
(See:
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34
).
To support yarn-cluster, we need to add a lots of functionalities to deploy
the thrift-server itself in a cluster.
However, istm there are many technical issues around this.


// maropu

On Fri, Jul 1, 2016 at 1:38 PM, Egor Pahomov <pahomov.e...@gmail.com> wrote:

> What about yarn-cluster mode?
>
> 2016-07-01 11:24 GMT-07:00 Egor Pahomov <pahomov.e...@gmail.com>:
>
>> Separate bad users with bad quires from good users with good quires.
>> Spark do not provide no scope separation out of the box.
>>
>> 2016-07-01 11:12 GMT-07:00 Jeff Zhang <zjf...@gmail.com>:
>>
>>> I think so, any reason you want to deploy multiple thrift server on one
>>> machine ?
>>>
>>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov <pahomov.e...@gmail.com>
>>> wrote:
>>>
>>>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
>>>> Jeff, thanks, I would try, but from your answer I'm getting the
>>>> feeling, that I'm trying some very rare case?
>>>>
>>>> 2016-07-01 10:54 GMT-07:00 Jeff Zhang <zjf...@gmail.com>:
>>>>
>>>>> This is not a bug, because these 2 processes use the
>>>>> same SPARK_PID_DIR which is /tmp by default.  Although you can resolve 
>>>>> this
>>>>> by using different SPARK_PID_DIR, I suspect you would still have other
>>>>> issues like port conflict. I would suggest you to deploy one spark thrift
>>>>> server per machine for now. If stick to deploy multiple spark thrift 
>>>>> server
>>>>> on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
>>>>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
>>>>> there's other conflicts. but please try first.
>>>>>
>>>>>
>>>>> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov <pahomov.e...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I get
>>>>>>
>>>>>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
>>>>>> process 28989.  Stop it first."
>>>>>>
>>>>>> Is it a bug?
>>>>>>
>>>>>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang <zjf...@gmail.com>:
>>>>>>
>>>>>>> I don't think the one instance per machine is true.  As long as you
>>>>>>> resolve the conflict issue such as port conflict, pid file, log file and
>>>>>>> etc, you can run multiple instances of spark thrift server.
>>>>>>>
>>>>>>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov <pahomov.e...@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>>>>>>>> bother me -
>>>>>>>>
>>>>>>>> 1) One instance per machine
>>>>>>>> 2) Yarn client only(not yarn cluster)
>>>>>>>>
>>>>>>>> Are there any architectural reasons for such limitations? About
>>>>>>>> yarn-client I might understand in theory - master is the same process 
>>>>>>>> as a
>>>>>>>> server, so it makes some sense, but it's really inconvenient - I need 
>>>>>>>> a lot
>>>>>>>> of memory on my driver machine. Reasons for one instance per machine I 
>>>>>>>> do
>>>>>>>> not understand.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> *Sincerely yoursEgor Pakhomov*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> *Sincerely yoursEgor Pakhomov*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> *Sincerely yoursEgor Pakhomov*
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
---
Takeshi Yamamuro

Re: Thrift JDBC server - why only one per machine and only yarn-client

Reply via email to