date:20140810

Re: --hiveconf vs -hiveconf

2014-08-10 Thread Lefty Leverenz

All occurrences of -hiveconf in the wiki have been changed to
--hiveconf except for one new sentence in the CLI command line options
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveCommandLineOptions
section, which says it's also supported.

The list of docs changed is in the first March 8th message in this thread.

-- Lefty


On Sat, Mar 8, 2014 at 11:55 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 What's the difference between double-dash options and single-dash options?

 -- Lefty


 On Sat, Mar 8, 2014 at 9:40 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 Great thanks for following up. THere might be a number of etl processes in
 the wild saying -hiveconf which is why it is important to keep around for
 the cli at least.


 On Sat, Mar 8, 2014 at 1:56 AM, Xuefu Zhang xzh...@cloudera.com wrote:

  This is just getting more and more interesting. I never thought of
  -hiveconf option, and always assumed it was a typo of --hiveconf.
 (That's
  why I edited the one, which triggered the discovery.) I just checked and
  found that both work, which is out of my surprise.
 
  With this assumption, Beeline has implemented only --hiveconf to mimic
 CLI.
 
  As to the documentation, I think we can stick to --hiveconf from now on,
  since they are supported by both CLI and Beeline. However, -hiveconf
 will
  continue to work for CLI until its death.
 
  Thanks,
  Xuefu
 
 
  On Fri, Mar 7, 2014 at 10:36 PM, Lefty Leverenz 
 leftylever...@gmail.com
  wrote:
 
OK, so just one of the pages in wiki has changed, and hive behavior
 has
   not changed
  
   That's right, and a closer look at the wiki shows that all the
 examples
  are
   -hiveconf except the new change.  The only place --hiveconf appears
 is in
   duplications of help messages for the hive command, the old Hive
 server,
  or
   Beeline.
  
   In a fresh export of the wiki --hiveconf occurs in these docs:
  
  - CLI
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveCommandLineOptions
   
   repeats
  what hive -H says (--hiveconf) but gives 3 examples of -hiveconf.
  - Admin Config
  
 
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive
   
   says
  --hiveconf twice, in text and an example (both changed this week).
  - Hive Server
   https://cwiki.apache.org/confluence/display/Hive/HiveServer
   says
  --hiveconf once, but that's the Thrift server help message.
  - HiveServer2
   Clients
  
 
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions
   says
   --hiveconf twice, but that's the Beeline option.
  
   These wikidocs say -hiveconf:
  
  - Getting Started (4 in config
   overview
  
 
 https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ConfigurationManagementOverview
   
   and
  2 in error logs
  
 
 https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ErrorLogs
   
  )
  - Avro SerDe
  
 
 https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-SpecifyingtheAvroschemaforatable
   (2
   in example and text)
  - Developer Guide
  
 
 https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-RunningHiveWithoutaHadoopCluster
   (4
   in export HIVE_OPTS)
  - HBase Integration
  
 
 https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-Usage
   (2
   in examples)
  - Variable Substitution
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
   (1
   in the evil laugh example)
  - CLI (2 in one
   example
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-Examples
   ,
  1 in logging
  
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-Logging
   
  )
  
   (My grep hits were inflated because -i caught HiveConf.)
  
   So what's it supposed to be?
  
  
   -- Lefty
  
  
   On Fri, Mar 7, 2014 at 11:06 PM, Thejas Nair the...@hortonworks.com
   wrote:
  
OK, so just one of the pages in wiki has changed, and hive behavior
has not changed ? (I have been using -hiveconf, but i haven't
 verified
that with the tip of the trunk as of now).
   
On Fri, Mar 7, 2014 at 6:19 PM, Xuefu Zhang xzh...@cloudera.com
  wrote:
 I didn't know that -hiveconf is supported. However, from hive -H,
   double
 dashes are seen.

  -h hostnameconnecting to Hive Server on
 remote
host
 --hiveconf property=value   Use value for given property
 --hivevar key=value Variable subsitution to apply to
  hive

 Thanks,
 Xuefu


 On Fri, Mar 7, 2014 at 6:00 PM, Edward Capriolo 
  edlinuxg...@gmail.com
wrote:

 I was not around when this change was made

[jira] [Assigned] (HIVE-7606) Design SparkSession, SparkSessionManager

2014-08-10 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti reassigned HIVE-7606:
-

Assignee: Venki Korukanti

 Design SparkSession, SparkSessionManager
 

 Key: HIVE-7606
 URL: https://issues.apache.org/jira/browse/HIVE-7606
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Venki Korukanti

 In this JIRA we'll design two interfaces:
 * SparkSessionState
 * SparkSessionPoolManager
 and then once that is agreed upon we'll design two implementations:
 * SparkSessionStateImpl
 * SparkSessionPoolManagerImpl
 the form and function of these will be similar to the Tez equivalents. 
 However, TezSessionState provides some implementation which SparkClient 
 already provides (refreshLocalResources*). Let's keep SparkSessionState 
 lightweight and not remove functionality from SparkClient. The scope of this 
 jira is just to create the shells and basic functionality. The 
 implementations in this jira should be able to:
 * Share a SparkSessionImpl across queries
 * Defining when a session can be re-used
 * Take ownership of SparkContext objects (Note we can only have a single SC 
 until SPARK-2243 is resolved)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session

2014-08-10 Thread Chinna Rao Lalam (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chinna Rao Lalam updated HIVE-7593:
---

Attachment: HIVE-7593-spark.patch

Instantiate SparkClient per user session

Key: HIVE-7593
URL: https://issues.apache.org/jira/browse/HIVE-7593
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
Attachments: HIVE-7593-spark.patch

SparkContext is the main class via which Hive talk to Spark cluster.
SparkClient encapsulates a SparkContext instance. Currently all user sessions
share a single SparkClient instance in HiveServer2. While this is good enough
for a POC, even for our first two milestones, this is not desirable for a
multi-tenancy environment and gives least flexibility to Hive users. Here is
what we propose:
1. Have a SparkClient instance per user session. The SparkClient instance is
created when user executes its first query in the session. It will get
destroyed when user session ends.
2. The SparkClient is instantiated based on the spark configurations that are
available to the user, including those defined at the global level and those
overwritten by the user (thru set command, for instance).
3. Ideally, when user changes any spark configuration during the session, the
old SparkClient instance should be destroyed and a new one based on the new
configurations is created. This may turn out to be a little hard, and thus
it's a nice-to-have. If not implemented, we need to document that
subsequent configuration changes will not take effect in the current session.
Please note that there is a thread-safety issue on Spark side where multiple
SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need
to work with Spark community to get this addressed.
Besides above functional requirements, avoid potential issues is also a
consideration. For instance, sharing SC among users is bad, as resources
(such as jar for UDF) will be also shared, which is problematic. On the other
hand, one SC per job seems too expensive, as the resource needs to be
re-rendered even there isn't any change.

48 matches

Mail list logo