Re: Hive MS SQL Server
Thanks Appan, the goal was just to have the metadata backing Hive in SQLServer not the hadoopp data itself.. Our DBA.s monitored the sql generated by DataNucleus against sql server and were typically non to happy :-) Hive SQL server therefore is a no goer for us at the moment so were looking at alternative approaches. Many thanks, Andy, On 28 March 2011 17:16, Appan Thirumaligai athirumali...@ngmoco.com wrote: Andy, Here is a short term work around - If your goal is to backup data in SQL Server just copy the data out of hadoop and import it into SQL (DTS / BCP/SSIS based on which version you are in). Email Viral (I guess he is still active in the email list) and I'm sure he is using Hive, Hadoop SQL Server. Thanks, Appan On Mar 25, 2011, at 1:59 AM, shared mailinglists wrote: Good Morning, Our DBA's created a new schema associated with the database and then made that the default schema for our hive user, unfortunately this resulted in the same problem in the logs… *“Check of existence of COLUMNS returned table type of VIEW”* * * *… *in that Hive still sees the default SQL Server COLUMNS view and therefore does create its own COLUMNS table. Is there any way we can configure Hive to use a different table name or any other approaches we could try ? Many thanks, Andy. On 24 March 2011 17:23, shared mailinglists shared.mailingli...@gmail.com wrote: Hi Carl, Many thanks for your suggestions I will put these to our DBAs and see if we can disable the default schema :-) Will post back soon. Cheers thanks for the rapid replies guys, Andy. On 24 March 2011 17:12, Carl Steinbach c...@cloudera.com wrote: Hi Andy, From what I understand SQLServer has the notion of a default schema (usually dbo) which is used to resolve identifiers that are not defined in a user's current schema. I think you need to either undefine the default schema for your metastore user account, or else make it point to the metastore schema. Here are some relevant links with more information: http://msdn.microsoft.com/en-us/library/ms190387.aspx http://stackoverflow.com/questions/3806245/sql-server-schema-and-default-schema http://dba.fyicenter.com/faq/sql_server_2/Default_Schema_of_Your_Login_Session.html Hope this helps. Carl On Thu, Mar 24, 2011 at 9:26 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Thu, Mar 24, 2011 at 11:36 AM, shared mailinglists shared.mailingli...@gmail.com wrote: Thanks Bernie, hopefully they will. Were a small Java development team within a predominately MS development house. We’re hopefully introducing new ideas but the normal company politics dictate that we should use SQL Server. That way maintenance, backup, recover etc etc can be handed over to the internal MS db team while freeing us guys to concentrate on better things like Hadoop Hive :-) I assumed with the DB just being a metadata store that the database wouldn’t be an issue but were struggling a bit:-( On 24 March 2011 15:23, Bennie Schut bsc...@ebuddy.com wrote: Sorry to become a bit offtopic but how do you get into a situation where sqlserver 2005 becomes a requirement for a hive internal meta store? I doubt many of the developers of hive will have access to this database so I don't expect a lot of response on this. But hopefully someone can prove me wrong :) Bennie. On 03/24/2011 04:01 PM, shared mailinglists wrote: Hi Hive users :-) Does anybody have experience of using Hive with MS SQL Server 2005? I’m currently stumped with the following issue https://issues.apache.org/jira/browse/HIVE-1391 where Hive (or DataNucleus?) confuses the COLUMNS table it requires internally with that of the default SQL Server sys.COLUMNS or information_schema.COLUMNS View and therefore does not automatically create the required metadata table when running the Hive CLI. Has anybody managed to get Hive to work with SQLServer 2005 or know how I can configure Hive to use a different table name to COLUMNS ? Unfortunately we have to use SQL Server and do not have the option to use Derby or MySQL etc. Many thanks, Andy. Let us not forget that M$ SQL Server is very advanced. It has for a long time supported many types of things that mysql just plain did not. (Did we all forget then mysql 3.X days where we had no Transactions or Foreign keys? :) There was one ticket I closed on it. https://issues.apache.org/jira/browse/HIVE-1391 As far as hive is concerned, m$ SQL server is JPOX/Data Nucleus supported so it should work. How many deployments exist in the wild are unknown.
Hive in a readonly mode
Hey all, I bet someone has already asked this question before, but I couldn't a thread with an answer to it, I want to give analysts in my organization access to hive in a readonly way, I.E, I don't want them to be able to create, drop tables, Alter tables , insert or load. How can I do that? Thanks, anyway
insert - Hadoop vs. Hive
Hi, I'm trying to compare adding files to hdfs for Hive usage using Hive inserts vs. adding to the hdfs directly then using Hive. Any comments, blogging about this? Thanks a lot, David Zonsheine
Re: Hive in a readonly mode
On Wed, Mar 30, 2011 at 9:29 AM, Guy Doulberg guy.doulb...@conduit.com wrote: Hey all, I bet someone has already asked this question before, but I couldn't a thread with an answer to it, I want to give analysts in my organization access to hive in a readonly way, I.E, I don't want them to be able to create, drop tables, Alter tables , insert or load. How can I do that? Thanks, anyway Hive 0.7.0 has grant/revoke syntax. If you are looking for a simple solution in a pre hadoop security world: Recursively own /user/hive/warehouse by a user different then the user running queries and enable dfs.permissions. This should prevent anyone from writing to the files. Although jobs running and failing will be less then optimal. After a few failures hopefully users will get the point.
figuring out the right setting for dfs.datanode.max.xcievers
I haven't found a good description on this setting and the costs in setting it too high. Hope somebody can explain. I have about a year's worth of data partitioned by date. Using 10 nodes and setting xcievers to 5000, I can only save into 100 or so partitions. As a result, I have to do 4 rounds of saving data into the underlying partitioned table (in s3). That's pretty slow. Should I just set xcievers to 1M or will hadoop crash a result? Is each xciever really a separate thread? When will the spelling be corrected? :) Thanks a bunch!
how to convert single line into multiple lines in a serde (txt in txt out)?
Want to extend RegexSerDe to parse apache web log: for each log entry, need to convert it into multiple entries. This is easy in streaming. But new to serde, wondering if it is doable and how? Thanks!
Re: how to convert single line into multiple lines in a serde (txt in txt out)?
On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang it.mjji...@gmail.com wrote: Want to extend RegexSerDe to parse apache web log: for each log entry, need to convert it into multiple entries. This is easy in streaming. But new to serde, wondering if it is doable and how? Thanks! You can have your serde produce liststruct and then explode() them.
Re: insert - Hadoop vs. Hive
If the data is already in the right format you should use LOAD syntax in Hive. This basically moves files into hdfs (so it should be not less performant than hdfs). If the data is not in the correct format or it needs to be transformed then the insert statement needs to be used. Ashish On Mar 30, 2011, at 6:30 AM, David Zonsheine wrote: Hi, I'm trying to compare adding files to hdfs for Hive usage using Hive inserts vs. adding to the hdfs directly then using Hive. Any comments, blogging about this? Thanks a lot, David Zonsheine
Re: how to convert single line into multiple lines in a serde (txt in txt out)?
Thanks Edward. You mean implement deserialize to return a liststruct? What is explode()? Sorry for basic questions. Could you please elaborate this a bit more or give me a link to some reference? Thanks! On Wed, Mar 30, 2011 at 12:03 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang it.mjji...@gmail.com wrote: Want to extend RegexSerDe to parse apache web log: for each log entry, need to convert it into multiple entries. This is easy in streaming. But new to serde, wondering if it is doable and how? Thanks! You can have your serde produce liststruct and then explode() them.
Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files
Thanks for the suggestion. The query created just one result file. Also, before trying this query, I have found out another way of making this work. I have added the following properties in hive-site.xml and it worked as well. It created just one result file. property namehive.merge.mapredfiles/name valuetrue/value descriptionMerge small files at the end of a map-reduce job/description /property property namehive.input.format/name valueorg.apache.hadoop.hive.ql.io.CombineHiveInputFormat/value descriptionThe default input format, if it is not specified, the system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19, whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always overwrite it - if there is a bug in CombineHiveInputFormat, it can always be manually set to HiveInputFormat. /description /property - Original Message From: Jov zhao6...@gmail.com To: user@hive.apache.org Sent: Tue, March 29, 2011 10:22:32 PM Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files try add limit: INSERT OVERWRITE LOCAL DIRECTORY '/home/hdp-user/hiveadmin_dirs/outbox/apachetest' Select host, identity, user, time, request from raw_apachelog where ds = '2011-03-22-001500' limit 32; 2011/3/30 V.Senthil Kumar vaisen2...@yahoo.com: Hello, I have a hive query which does a simple select and writes the results to a local file system. For example, a query like this, INSERT OVERWRITE LOCAL DIRECTORY '/home/hdp-user/hiveadmin_dirs/outbox/apachetest' Select host, identity, user, time, request from raw_apachelog where ds = '2011-03-22-001500'; Now this creates a two files under apachetest folder. This table has only 32 rows. Is there any way I can make Hive to create only single file ? Appreciate your help :) Thanks, Senthil
Re: how to convert single line into multiple lines in a serde (txt in txt out)?
Thanks Edward. That'll work. But that also means 2 tables will be created. How about we only want one table by using some serde s.t. it reads apache web log, generates multiple rows for each line of entry in the log that get loaded into the target table that I want? Is it doable by customizing RegexSerde? i.e. create external table A (...) row format serde 'serdeclass' with serdeproperties (...) stored as textfile location 'pathtoapachelog'; will give you the table that has right fields extracted and multiple rows generated for query later. If I cannot create such a serde without creating a 2nd table for the task, I think streaming is a better choice from source code management aspect: using serde requires you to manage more libraries (hadoop, hive ...) for build. Thanks! On Wed, Mar 30, 2011 at 1:16 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Mar 30, 2011 at 3:46 PM, Michael Jiang it.mjji...@gmail.com wrote: Also what if I want just one step to load each log entry line from log file and for each generate multiple lines? That is, just one table created. I don't want to have one table and then call explode() to get multiple lines. Otherwise, alternative way is to use streaming on loaded table to turn it into another one with no need to customize a serde. So, yeah, the goal here is to see how a serde can do this stuff. Thanks! On Wed, Mar 30, 2011 at 12:03 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang it.mjji...@gmail.com wrote: Want to extend RegexSerDe to parse apache web log: for each log entry, need to convert it into multiple entries. This is easy in streaming. But new to serde, wondering if it is doable and how? Thanks! You can have your serde produce liststruct and then explode() them. The role of SerDe is to take the output from the InputFormat and use the information inside the metastore to decode it. As a result this is not a good fit for a spot to turn a single row into multiple rows. What I am suggesting is define a column like this create table ...( id int, listString log_entries) RowFormat serde Make sure your serde decodes and populates log_entires. From there you can use lateral view and explode http://wiki.apache.org/hadoop/Hive/LanguageManual/LateralView to turn the listString into rows. Edward