Re: Hive External Table issue

2013-06-20 Thread Stephen Sprague
i agree.

conclusion: unless you're some kind of hive guru use a directory location
and get that to work before trying to get clever with file locations -
especially when you see an error message about "not a directory and unable
to create it" :)   Walk before you run good people.


On Thu, Jun 20, 2013 at 11:55 AM, Nitin Pawar wrote:

> Ramki,
>
> I was going through that thread before as Sanjeev said it worked so I was
> doing some experiment as well.
> As you I too had the impression that Hive tables are associated with
> directories and as pointed out I was wrong.
>
> Basically the idea of pointing a table to a file as mentioned on that
> thread is kind of hack
> create table without location
> alter table to point to file
>
> From Mark's answer what he suggest is we can use virtual column
> INPUT__FILE__NAME to select which file we want to use while querying in
> case there are multiple files inside a directory and you just want to use a
> specific one.
>
> The bug I mentioned is for  files, having particular files from a
> directory matching the regex. Not for the regex serde.
>
> Correct my understanding if I got anything wrong
>
>
>
>
> On Fri, Jun 21, 2013 at 12:04 AM, Ramki Palle wrote:
>
>> Nitin,
>>
>> Can you go through the thread with subject "S3/EMR Hive: Load contents
>> of a single file"  on Tue, 26 Mar, 17:11> at
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1
>>
>>  This gives the whole discussion about the topic of table location
>> pointing to a filename vs. directory.
>>
>> Can you give your insight from this discussion and the discussion you
>> mentioned at stackoverflow link?
>>
>> Regards,
>> Ramki.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar wrote:
>>
>>> Also see this JIRA
>>> https://issues.apache.org/jira/browse/HIVE-951
>>>
>>> I think issue you are facing is due to the JIRA
>>>
>>>
>>> On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar 
>>> wrote:
>>>
>>>> Mark has answered this before
>>>>
>>>> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>>>>
>>>> If this link does not answer your question, do let us know
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar <
>>>> sanjeev.sa...@gmail.com> wrote:
>>>>
>>>>> Two issues:
>>>>>
>>>>> 1. I've created external tables in hive based on file location before
>>>>> and it work without any issue. It don't have to be a directory.
>>>>>
>>>>> 2. If there are more than one file in the directory, and you create
>>>>> external table based on directory then how the table knows that which file
>>>>> it need to look for the data?
>>>>>
>>>>> I tried to create the table based on directory, it created the table
>>>>> but all the rows were NULL.
>>>>>
>>>>> -Sanjeev
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar >>>> > wrote:
>>>>>
>>>>>> in hive when you create table and use the location to refer hdfs
>>>>>> path, that path is supposed to be a directory.
>>>>>> If the directory is not existing it will try to create it and if its
>>>>>> a file it will throw an error as its not a directory
>>>>>>
>>>>>> thats the error you are getting that location you referred is a file.
>>>>>> Change it to the directory and see if that works for you
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar <
>>>>>> sanjeev.sa...@gmail.com> wrote:
>>>>>>
>>>>>>> I did mention in my mail the hdfs file exists in that location. See
>>>>>>> below
>>>>>>>
>>>>>>> In HDFS: file exists
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> hadoop fs -ls
>>>>>>>
>>>>>>> /user/flume/events/request_logs/
>>>>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>>>
>>>>>>> Found 1 items
>>>>>>>
>>>&g

Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Ramki,

I was going through that thread before as Sanjeev said it worked so I was
doing some experiment as well.
As you I too had the impression that Hive tables are associated with
directories and as pointed out I was wrong.

Basically the idea of pointing a table to a file as mentioned on that
thread is kind of hack
create table without location
alter table to point to file

>From Mark's answer what he suggest is we can use virtual column
INPUT__FILE__NAME to select which file we want to use while querying in
case there are multiple files inside a directory and you just want to use a
specific one.

The bug I mentioned is for  files, having particular files from a directory
matching the regex. Not for the regex serde.

Correct my understanding if I got anything wrong




On Fri, Jun 21, 2013 at 12:04 AM, Ramki Palle  wrote:

> Nitin,
>
> Can you go through the thread with subject "S3/EMR Hive: Load contents of
> a single file"  on Tue, 26 Mar, 17:11> at
>
>
> http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1
>
>  This gives the whole discussion about the topic of table location
> pointing to a filename vs. directory.
>
> Can you give your insight from this discussion and the discussion you
> mentioned at stackoverflow link?
>
> Regards,
> Ramki.
>
>
>
> On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar wrote:
>
>> Also see this JIRA
>> https://issues.apache.org/jira/browse/HIVE-951
>>
>> I think issue you are facing is due to the JIRA
>>
>>
>> On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar wrote:
>>
>>> Mark has answered this before
>>>
>>> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>>>
>>> If this link does not answer your question, do let us know
>>>
>>>
>>> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar >> > wrote:
>>>
>>>> Two issues:
>>>>
>>>> 1. I've created external tables in hive based on file location before
>>>> and it work without any issue. It don't have to be a directory.
>>>>
>>>> 2. If there are more than one file in the directory, and you create
>>>> external table based on directory then how the table knows that which file
>>>> it need to look for the data?
>>>>
>>>> I tried to create the table based on directory, it created the table
>>>> but all the rows were NULL.
>>>>
>>>> -Sanjeev
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar 
>>>> wrote:
>>>>
>>>>> in hive when you create table and use the location to refer hdfs path,
>>>>> that path is supposed to be a directory.
>>>>> If the directory is not existing it will try to create it and if its a
>>>>> file it will throw an error as its not a directory
>>>>>
>>>>> thats the error you are getting that location you referred is a file.
>>>>> Change it to the directory and see if that works for you
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar <
>>>>> sanjeev.sa...@gmail.com> wrote:
>>>>>
>>>>>> I did mention in my mail the hdfs file exists in that location. See
>>>>>> below
>>>>>>
>>>>>> In HDFS: file exists
>>>>>>
>>>>>>
>>>>>>
>>>>>> hadoop fs -ls
>>>>>>
>>>>>> /user/flume/events/request_logs/
>>>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>>
>>>>>> Found 1 items
>>>>>>
>>>>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>>>>
>>>>>> /user/flume/events/request_logs/
>>>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>>
>>>>>> so the directory and file both exists.
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar <
>>>>>> nitinpawar...@gmail.com> wrote:
>>>>>>
>>>>>>> MetaException(message:hdfs://
>>>>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>>>
>>>>>>>

Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
Nitin,

Can you go through the thread with subject "S3/EMR Hive: Load contents of a
single file"  on Tue, 26 Mar, 17:11> at


http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1

 This gives the whole discussion about the topic of table location pointing
to a filename vs. directory.

Can you give your insight from this discussion and the discussion you
mentioned at stackoverflow link?

Regards,
Ramki.



On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar wrote:

> Also see this JIRA
> https://issues.apache.org/jira/browse/HIVE-951
>
> I think issue you are facing is due to the JIRA
>
>
> On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar wrote:
>
>> Mark has answered this before
>>
>> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>>
>> If this link does not answer your question, do let us know
>>
>>
>> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
>> wrote:
>>
>>> Two issues:
>>>
>>> 1. I've created external tables in hive based on file location before
>>> and it work without any issue. It don't have to be a directory.
>>>
>>> 2. If there are more than one file in the directory, and you create
>>> external table based on directory then how the table knows that which file
>>> it need to look for the data?
>>>
>>> I tried to create the table based on directory, it created the table but
>>> all the rows were NULL.
>>>
>>> -Sanjeev
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar 
>>> wrote:
>>>
>>>> in hive when you create table and use the location to refer hdfs path,
>>>> that path is supposed to be a directory.
>>>> If the directory is not existing it will try to create it and if its a
>>>> file it will throw an error as its not a directory
>>>>
>>>> thats the error you are getting that location you referred is a file.
>>>> Change it to the directory and see if that works for you
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar <
>>>> sanjeev.sa...@gmail.com> wrote:
>>>>
>>>>> I did mention in my mail the hdfs file exists in that location. See
>>>>> below
>>>>>
>>>>> In HDFS: file exists
>>>>>
>>>>>
>>>>>
>>>>> hadoop fs -ls
>>>>>
>>>>> /user/flume/events/request_logs/
>>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>
>>>>> Found 1 items
>>>>>
>>>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>>>
>>>>> /user/flume/events/request_logs/
>>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>
>>>>> so the directory and file both exists.
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar >>>> > wrote:
>>>>>
>>>>>> MetaException(message:hdfs://
>>>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>>
>>>>>> is not a directory or unable to create one)
>>>>>>
>>>>>>
>>>>>> it clearly says its not a directory. Point to the dictory and it will
>>>>>> work
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
>>>>>> sanjeev.sa...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello Everyone, I'm running into the following Hive external table
>>>>>>> issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> hive> CREATE EXTERNAL TABLE access(
>>>>>>>
>>>>>>>  >   host STRING,
>>>>>>>
>>>>>>>  >   identity STRING,
>>>>>>>
>>>>>>>  >   user STRING,
>>>>>>>
>>>>>>>  >   time STRING,
>>>>>>>
>>>>>>>  >   request STRING,
>>>>>>>
>>>>>>>  >   status STRING,
>>>>>>>
>>>>>>>  >   size ST

Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Also see this JIRA
https://issues.apache.org/jira/browse/HIVE-951

I think issue you are facing is due to the JIRA


On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar wrote:

> Mark has answered this before
>
> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>
> If this link does not answer your question, do let us know
>
>
> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
> wrote:
>
>> Two issues:
>>
>> 1. I've created external tables in hive based on file location before and
>> it work without any issue. It don't have to be a directory.
>>
>> 2. If there are more than one file in the directory, and you create
>> external table based on directory then how the table knows that which file
>> it need to look for the data?
>>
>> I tried to create the table based on directory, it created the table but
>> all the rows were NULL.
>>
>> -Sanjeev
>>
>>
>> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:
>>
>>> in hive when you create table and use the location to refer hdfs path,
>>> that path is supposed to be a directory.
>>> If the directory is not existing it will try to create it and if its a
>>> file it will throw an error as its not a directory
>>>
>>> thats the error you are getting that location you referred is a file.
>>> Change it to the directory and see if that works for you
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar >> > wrote:
>>>
>>>> I did mention in my mail the hdfs file exists in that location. See
>>>> below
>>>>
>>>> In HDFS: file exists
>>>>
>>>>
>>>>
>>>> hadoop fs -ls
>>>>
>>>> /user/flume/events/request_logs/
>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>> Found 1 items
>>>>
>>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>>
>>>> /user/flume/events/request_logs/
>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>> so the directory and file both exists.
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
>>>> wrote:
>>>>
>>>>> MetaException(message:hdfs://
>>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>>
>>>>> is not a directory or unable to create one)
>>>>>
>>>>>
>>>>> it clearly says its not a directory. Point to the dictory and it will
>>>>> work
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
>>>>> sanjeev.sa...@gmail.com> wrote:
>>>>>
>>>>>> Hello Everyone, I'm running into the following Hive external table
>>>>>> issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>> hive> CREATE EXTERNAL TABLE access(
>>>>>>
>>>>>>  >   host STRING,
>>>>>>
>>>>>>  >   identity STRING,
>>>>>>
>>>>>>  >   user STRING,
>>>>>>
>>>>>>  >   time STRING,
>>>>>>
>>>>>>  >   request STRING,
>>>>>>
>>>>>>  >   status STRING,
>>>>>>
>>>>>>  >   size STRING,
>>>>>>
>>>>>>  >   referer STRING,
>>>>>>
>>>>>>  >   agent STRING)
>>>>>>
>>>>>>  >   ROW FORMAT SERDE
>>>>>>
>>>>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>>>>
>>>>>>  >   WITH SERDEPROPERTIES (
>>>>>>
>>>>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*)
>>>>>> (-|\\[[^\\]]*\\])
>>>>>>
>>>>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\")
>>>>>> ([^ \"]*|\"[^\"]*\"))?",
>>>>>>
>>>>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %

Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
1. I was under the impression that you cannot refer the table location to a
file. But, it looks like it works. Please see the discussion in the thread
 http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/%
3c556325346ca26341b6f0530e07f90d96017084360...@gbgh-exch-cms.sig.ads%3e

2. It there are more than one file in the directory, your query gets the
data from all the files in that directory.

In your case, the regex may not be parsing the data properly.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 11:03 AM, sanjeev sagar wrote:

> Two issues:
>
> 1. I've created external tables in hive based on file location before and
> it work without any issue. It don't have to be a directory.
>
> 2. If there are more than one file in the directory, and you create
> external table based on directory then how the table knows that which file
> it need to look for the data?
>
> I tried to create the table based on directory, it created the table but
> all the rows were NULL.
>
> -Sanjeev
>
>
> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:
>
>> in hive when you create table and use the location to refer hdfs path,
>> that path is supposed to be a directory.
>> If the directory is not existing it will try to create it and if its a
>> file it will throw an error as its not a directory
>>
>> thats the error you are getting that location you referred is a file.
>> Change it to the directory and see if that works for you
>>
>>
>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
>> wrote:
>>
>>> I did mention in my mail the hdfs file exists in that location. See below
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> so the directory and file both exists.
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
>>> wrote:
>>>
>>>> MetaException(message:hdfs://
>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>> is not a directory or unable to create one)
>>>>
>>>>
>>>> it clearly says its not a directory. Point to the dictory and it will
>>>> work
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
>>>> sanjeev.sa...@gmail.com> wrote:
>>>>
>>>>> Hello Everyone, I'm running into the following Hive external table
>>>>> issue.
>>>>>
>>>>>
>>>>>
>>>>> hive> CREATE EXTERNAL TABLE access(
>>>>>
>>>>>  >   host STRING,
>>>>>
>>>>>  >   identity STRING,
>>>>>
>>>>>  >   user STRING,
>>>>>
>>>>>  >   time STRING,
>>>>>
>>>>>  >   request STRING,
>>>>>
>>>>>  >   status STRING,
>>>>>
>>>>>  >   size STRING,
>>>>>
>>>>>  >   referer STRING,
>>>>>
>>>>>  >   agent STRING)
>>>>>
>>>>>  >   ROW FORMAT SERDE
>>>>>
>>>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>>>
>>>>>  >   WITH SERDEPROPERTIES (
>>>>>
>>>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>>>>
>>>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>>>>> \"]*|\"[^\"]*\"))?",
>>>>>
>>>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>>>>
>>>>> %7$s %8$s %9$s"
>>>>>
>>>>>  >   )
>>>>>
>>>>>  >   STORED AS TEXTFILE
>>>>>
>>>>>  >   LOCATION
>>>>>
>>>>> '/user/flume/events/request_logs/
>>>>&g

Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Mark has answered this before
http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

If this link does not answer your question, do let us know


On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar wrote:

> Two issues:
>
> 1. I've created external tables in hive based on file location before and
> it work without any issue. It don't have to be a directory.
>
> 2. If there are more than one file in the directory, and you create
> external table based on directory then how the table knows that which file
> it need to look for the data?
>
> I tried to create the table based on directory, it created the table but
> all the rows were NULL.
>
> -Sanjeev
>
>
> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:
>
>> in hive when you create table and use the location to refer hdfs path,
>> that path is supposed to be a directory.
>> If the directory is not existing it will try to create it and if its a
>> file it will throw an error as its not a directory
>>
>> thats the error you are getting that location you referred is a file.
>> Change it to the directory and see if that works for you
>>
>>
>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
>> wrote:
>>
>>> I did mention in my mail the hdfs file exists in that location. See below
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> so the directory and file both exists.
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
>>> wrote:
>>>
>>>> MetaException(message:hdfs://
>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>> is not a directory or unable to create one)
>>>>
>>>>
>>>> it clearly says its not a directory. Point to the dictory and it will
>>>> work
>>>>
>>>>
>>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
>>>> sanjeev.sa...@gmail.com> wrote:
>>>>
>>>>> Hello Everyone, I'm running into the following Hive external table
>>>>> issue.
>>>>>
>>>>>
>>>>>
>>>>> hive> CREATE EXTERNAL TABLE access(
>>>>>
>>>>>  >   host STRING,
>>>>>
>>>>>  >   identity STRING,
>>>>>
>>>>>  >   user STRING,
>>>>>
>>>>>  >   time STRING,
>>>>>
>>>>>  >   request STRING,
>>>>>
>>>>>  >   status STRING,
>>>>>
>>>>>  >   size STRING,
>>>>>
>>>>>  >   referer STRING,
>>>>>
>>>>>  >   agent STRING)
>>>>>
>>>>>  >   ROW FORMAT SERDE
>>>>>
>>>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>>>
>>>>>  >   WITH SERDEPROPERTIES (
>>>>>
>>>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>>>>
>>>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>>>>> \"]*|\"[^\"]*\"))?",
>>>>>
>>>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>>>>
>>>>> %7$s %8$s %9$s"
>>>>>
>>>>>  >   )
>>>>>
>>>>>  >   STORED AS TEXTFILE
>>>>>
>>>>>  >   LOCATION
>>>>>
>>>>> '/user/flume/events/request_logs/
>>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>>>>
>>>>> FAILED: Error in metadata:
>>>>>
>>>>> MetaException(message:hdfs://
>>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoint

Re: Hive External Table issue

2013-06-20 Thread sanjeev sagar
Two issues:

1. I've created external tables in hive based on file location before and
it work without any issue. It don't have to be a directory.

2. If there are more than one file in the directory, and you create
external table based on directory then how the table knows that which file
it need to look for the data?

I tried to create the table based on directory, it created the table but
all the rows were NULL.

-Sanjeev


On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:

> in hive when you create table and use the location to refer hdfs path,
> that path is supposed to be a directory.
> If the directory is not existing it will try to create it and if its a
> file it will throw an error as its not a directory
>
> thats the error you are getting that location you referred is a file.
> Change it to the directory and see if that works for you
>
>
> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
> wrote:
>
>> I did mention in my mail the hdfs file exists in that location. See below
>>
>> In HDFS: file exists
>>
>>
>>
>> hadoop fs -ls
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> Found 1 items
>>
>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> so the directory and file both exists.
>>
>>
>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar wrote:
>>
>>> MetaException(message:hdfs://
>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> is not a directory or unable to create one)
>>>
>>>
>>> it clearly says its not a directory. Point to the dictory and it will
>>> work
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar >> > wrote:
>>>
>>>> Hello Everyone, I'm running into the following Hive external table
>>>> issue.
>>>>
>>>>
>>>>
>>>> hive> CREATE EXTERNAL TABLE access(
>>>>
>>>>  >   host STRING,
>>>>
>>>>  >   identity STRING,
>>>>
>>>>  >   user STRING,
>>>>
>>>>  >   time STRING,
>>>>
>>>>  >   request STRING,
>>>>
>>>>  >   status STRING,
>>>>
>>>>  >   size STRING,
>>>>
>>>>  >   referer STRING,
>>>>
>>>>  >   agent STRING)
>>>>
>>>>  >   ROW FORMAT SERDE
>>>>
>>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>>
>>>>  >   WITH SERDEPROPERTIES (
>>>>
>>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>>>
>>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>>>> \"]*|\"[^\"]*\"))?",
>>>>
>>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>>>
>>>> %7$s %8$s %9$s"
>>>>
>>>>  >   )
>>>>
>>>>  >   STORED AS TEXTFILE
>>>>
>>>>  >   LOCATION
>>>>
>>>> '/user/flume/events/request_logs/
>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>>>
>>>> FAILED: Error in metadata:
>>>>
>>>> MetaException(message:hdfs://
>>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>> is not a directory or unable to create one)
>>>>
>>>> FAILED: Execution Error, return code 1 from
>>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> In HDFS: file exists
>>>>
>>>>
>>>>
>>>> hadoop fs -ls
>>>>
>>>> /user/flume/events/request_logs/
>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>> Found 1 items
>>>>
>>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>>
>>>> /user/flume/events/request_logs/
>>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>>
>>>>
>>>>
>>>> I've download the serde2 jar file too and install it in
>>>> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
>>>> services after that.
>>>>
>>>>
>>>>
>>>> I even added the jar file manually in hive and run the above sql but
>>>> still failing.
>>>>
>>>> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>>>
>>>>  > ;
>>>>
>>>> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
>>>> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>>>
>>>>
>>>>
>>>> Any help would be highly appreciable.
>>>>
>>>>
>>>>
>>>> -Sanjeev
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sanjeev Sagar
>>>>
>>>> *"**Separate yourself from everything that separates you from others !"- 
>>>> Nirankari
>>>> Baba Hardev Singh ji *
>>>>
>>>> **
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> Sanjeev Sagar
>>
>> *"**Separate yourself from everything that separates you from others !"- 
>> Nirankari
>> Baba Hardev Singh ji *
>>
>> **
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**


Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
in hive when you create table and use the location to refer hdfs path, that
path is supposed to be a directory.
If the directory is not existing it will try to create it and if its a file
it will throw an error as its not a directory

thats the error you are getting that location you referred is a file.
Change it to the directory and see if that works for you


On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar wrote:

> I did mention in my mail the hdfs file exists in that location. See below
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> so the directory and file both exists.
>
>
> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar wrote:
>
>> MetaException(message:hdfs://
>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> is not a directory or unable to create one)
>>
>>
>> it clearly says its not a directory. Point to the dictory and it will
>> work
>>
>>
>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
>> wrote:
>>
>>> Hello Everyone, I'm running into the following Hive external table issue.
>>>
>>>
>>>
>>> hive> CREATE EXTERNAL TABLE access(
>>>
>>>  >   host STRING,
>>>
>>>  >   identity STRING,
>>>
>>>  >   user STRING,
>>>
>>>  >   time STRING,
>>>
>>>  >   request STRING,
>>>
>>>  >   status STRING,
>>>
>>>  >   size STRING,
>>>
>>>  >   referer STRING,
>>>
>>>  >   agent STRING)
>>>
>>>  >   ROW FORMAT SERDE
>>>
>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>
>>>  >   WITH SERDEPROPERTIES (
>>>
>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>>
>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>>> \"]*|\"[^\"]*\"))?",
>>>
>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>>
>>> %7$s %8$s %9$s"
>>>
>>>  >   )
>>>
>>>  >   STORED AS TEXTFILE
>>>
>>>  >   LOCATION
>>>
>>> '/user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>>
>>> FAILED: Error in metadata:
>>>
>>> MetaException(message:hdfs://
>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> is not a directory or unable to create one)
>>>
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>
>>>
>>>
>>>
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>>
>>>
>>> I've download the serde2 jar file too and install it in
>>> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
>>> services after that.
>>>
>>>
>>>
>>> I even added the jar file manually in hive and run the above sql but
>>> still failing.
>>>
>>> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>>
>>>  > ;
>>>
>>> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
>>> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>>
>>>
>>>
>>> Any help would be highly appreciable.
>>>
>>>
>>>
>>> -Sanjeev
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sanjeev Sagar
>>>
>>> *"**Separate yourself from everything that separates you from others !"- 
>>> Nirankari
>>> Baba Hardev Singh ji *
>>>
>>> **
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Sanjeev Sagar
>
> *"**Separate yourself from everything that separates you from others !" - 
> Nirankari
> Baba Hardev Singh ji *
>
> **
>



-- 
Nitin Pawar


Re: Hive External Table issue

2013-06-20 Thread sanjeev sagar
I did mention in my mail the hdfs file exists in that location. See below

In HDFS: file exists



hadoop fs -ls

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

Found 1 items

-rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

so the directory and file both exists.


On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar wrote:

> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
>
> it clearly says its not a directory. Point to the dictory and it will work
>
>
> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
> wrote:
>
>> Hello Everyone, I'm running into the following Hive external table issue.
>>
>>
>>
>> hive> CREATE EXTERNAL TABLE access(
>>
>>  >   host STRING,
>>
>>  >   identity STRING,
>>
>>  >   user STRING,
>>
>>  >   time STRING,
>>
>>  >   request STRING,
>>
>>  >   status STRING,
>>
>>  >   size STRING,
>>
>>  >   referer STRING,
>>
>>  >   agent STRING)
>>
>>  >   ROW FORMAT SERDE
>>
>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>
>>  >   WITH SERDEPROPERTIES (
>>
>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>
>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>> \"]*|\"[^\"]*\"))?",
>>
>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>
>> %7$s %8$s %9$s"
>>
>>  >   )
>>
>>  >   STORED AS TEXTFILE
>>
>>  >   LOCATION
>>
>> '/user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>
>> FAILED: Error in metadata:
>>
>> MetaException(message:hdfs://
>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> is not a directory or unable to create one)
>>
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>>
>>
>>
>> In HDFS: file exists
>>
>>
>>
>> hadoop fs -ls
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> Found 1 items
>>
>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>>
>>
>> I've download the serde2 jar file too and install it in
>> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
>> services after that.
>>
>>
>>
>> I even added the jar file manually in hive and run the above sql but
>> still failing.
>>
>> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>
>>  > ;
>>
>> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
>> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>
>>
>>
>> Any help would be highly appreciable.
>>
>>
>>
>> -Sanjeev
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sanjeev Sagar
>>
>> *"**Separate yourself from everything that separates you from others !"- 
>> Nirankari
>> Baba Hardev Singh ji *
>>
>> **
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**


Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
MetaException(message:hdfs://
h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

is not a directory or unable to create one)


it clearly says its not a directory. Point to the dictory and it will work


On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar wrote:

> Hello Everyone, I'm running into the following Hive external table issue.
>
>
>
> hive> CREATE EXTERNAL TABLE access(
>
>  >   host STRING,
>
>  >   identity STRING,
>
>  >   user STRING,
>
>  >   time STRING,
>
>  >   request STRING,
>
>  >   status STRING,
>
>  >   size STRING,
>
>  >   referer STRING,
>
>  >   agent STRING)
>
>  >   ROW FORMAT SERDE
>
> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>
>  >   WITH SERDEPROPERTIES (
>
>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>
> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
> \"]*|\"[^\"]*\"))?",
>
>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>
> %7$s %8$s %9$s"
>
>  >   )
>
>  >   STORED AS TEXTFILE
>
>  >   LOCATION
>
> '/user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>
> FAILED: Error in metadata:
>
> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
>
>
> I've download the serde2 jar file too and install it in
> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
> services after that.
>
>
>
> I even added the jar file manually in hive and run the above sql but still
> failing.
>
> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>  > ;
>
> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>
>
> Any help would be highly appreciable.
>
>
>
> -Sanjeev
>
>
>
>
>
>
>
>
>
> --
> Sanjeev Sagar
>
> *"**Separate yourself from everything that separates you from others !" - 
> Nirankari
> Baba Hardev Singh ji *
>
> **
>



-- 
Nitin Pawar


Hive External Table issue

2013-06-20 Thread sanjeev sagar
Hello Everyone, I'm running into the following Hive external table issue.



hive> CREATE EXTERNAL TABLE access(

 >   host STRING,

 >   identity STRING,

 >   user STRING,

 >   time STRING,

 >   request STRING,

 >   status STRING,

 >   size STRING,

 >   referer STRING,

 >   agent STRING)

 >   ROW FORMAT SERDE

'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 >   WITH SERDEPROPERTIES (

 >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
\"]*|\"[^\"]*\"))?",

 >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s

%7$s %8$s %9$s"

 >   )

 >   STORED AS TEXTFILE

 >   LOCATION

'/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

FAILED: Error in metadata:

MetaException(message:hdfs://
h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

is not a directory or unable to create one)

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask





In HDFS: file exists



hadoop fs -ls

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

Found 1 items

-rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



I've download the serde2 jar file too and install it in
/usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
services after that.



I even added the jar file manually in hive and run the above sql but still
failing.

ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

 > ;

Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



Any help would be highly appreciable.



-Sanjeev









-- 
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**