[jira] [Assigned] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2018-05-15 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HIVE-13275:
--

Assignee: (was: Harsh J)

> Add a toString method to BytesRefArrayWritable
> --
>
> Key: HIVE-13275
> URL: https://issues.apache.org/jira/browse/HIVE-13275
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Priority: Trivial
> Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-8554) Hive Server 2 should support multiple authentication types at the same time

2017-03-30 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HIVE-8554.
---
Resolution: Duplicate

> Hive Server 2 should support multiple authentication types at the same time
> ---
>
> Key: HIVE-8554
> URL: https://issues.apache.org/jira/browse/HIVE-8554
> Project: Hive
>  Issue Type: Bug
>Reporter: Joey Echeverria
>
> It's very common for clusters to use LDAP/Active Directory as an identity 
> provider for a cluster while using Kerberos authentication. It would be 
> useful if users could seamlessly switch between using LDAP username/password 
> authentication and Kerberos authentication without having to run multiple 
> Hive Server 2 instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-17 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Assignee: (was: Harsh J)
  Status: Open  (was: Patch Available)

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-17 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872048#comment-15872048
 ] 

Harsh J commented on HIVE-15908:


After a bit more testing with some slow logging queries, aside of just this 
newline flush on the server side, increasing the fetch size in HiveStatement to 
a very large value (1000 rows) also helps, as does decreasing the Beeline 
Command class's UI-jarring 1 second pause between fetches to something like 
100-200 ms.

I'm unsure if such changes are acceptable though, as they'd increase the 
running load on the HS2 given overall beeline usage. FWIW, Hue feels more 
pleasant to use, and it polls the query logs with a fetch size of 1000 rows 
with a dynamic refresh sleep time that begins with 100 ms and scales up to 2s 
over time, in increments of 100ms (this works better cause there's more logging 
at the beginning of the query than around the end).

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865281#comment-15865281
 ] 

Harsh J commented on HIVE-15908:


(H/T to Lingesh Radhakrishnan for the discovery)

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Affects Version/s: 0.13.0

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Status: Patch Available  (was: Open)

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Attachment: HIVE-15908.000.patch

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HIVE-15908:
--


> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-11325) Infinite loop in HiveHFileOutputFormat

2016-11-15 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HIVE-11325:
--

Assignee: (was: Harsh J)

> Infinite loop in HiveHFileOutputFormat
> --
>
> Key: HIVE-11325
> URL: https://issues.apache.org/jira/browse/HIVE-11325
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.0.0
>Reporter: Harsh J
> Attachments: HIVE-11325.patch
>
>
> No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
> regularly in Hive builds, but here's the gist of the issue:
> The condition at 
> https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
>  indicates that we will infinitely loop until we find a file whose last path 
> component (the name) is equal to the column family name.
> In execution, however, the iteration enters an actual infinite loop cause the 
> file we end up considering as the srcDir name, is actually the region file, 
> whose name will never match the family name.
> This is an example of the IPC the listing loop of a 100% progress task gets 
> stuck in:
> {code}
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
> "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c"
>  startAfter: "" needLocation: false}
> 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
> org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive sending #510346
> 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
> Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
> value #510346
> 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> Call: getListing took 0ms
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
> partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 
> 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 
> access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
> 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
> {code}
> The path we are getting out of the listing results is 
> {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}},
>  but instead of checking the path's parent {{family}} we're instead looping 
> infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
> cause it does not match {{family}}.
> It stays in the infinite loop therefore, until the MR framework kills it away 
> due to an idle task timeout (and then since the subsequent task attempts fail 
> outright, the job fails).
> While doing a {{getPath().getParent()}} will resolve that, is that infinite 
> loop even necessary? Especially given the fact that we throw exceptions if 
> there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14593) Non-canonical integer partition columns do not work with IN operations

2016-08-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429771#comment-15429771
 ] 

Harsh J commented on HIVE-14593:


Thank you [~gopalv],

Would that keep compatibility? For instance, on such an existent table? And 
will inserting a b=7 produce a second partition?

Can the GenericUDFIn not be changed to match MySQL approach instead, where the 
IN(…) types are converted to match the column type instead of vice-versa? The 
class-doc states it was done to keep consistency with other UDF approaches.

{quote}
 * Also noteworthy: type conversion behavior is different from MySQL. With
 * expr IN expr1, expr2... in MySQL, exprN will each be converted into the same
 * type as expr. In the Hive implementation, all expr(N) will be converted into
 * a common type for conversion consistency with other UDF's, and to prevent
 * conversions from a big type to a small type (e.g. int to tinyint)
{quote}

> Non-canonical integer partition columns do not work with IN operations
> --
>
> Key: HIVE-14593
> URL: https://issues.apache.org/jira/browse/HIVE-14593
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Harsh J
>
> The below use-case no longer works (tested on a PostgresQL backed HMS using 
> JDO as well as on a MySQL backed HMS with DirectSQL):
> {code}
> CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT);
> ALTER TABLE foo ADD PARTITION (b='07', c='08');
> LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', 
> c='08');
> -- Does not work if you provide a string IN variable:
> SELECT a, c FROM foo WHERE b IN ('07');
> (No rows selected)
> -- Works if you provide it in integer forms or canonical integer strings:
> SELECT a, c FROM foo WHERE b IN (07);
> (1 row(s) selected)
> SELECT a, c FROM foo WHERE b IN (7);
> (1 row(s) selected)
> SELECT a, c FROM foo WHERE b IN ('7');
> (1 row(s) selected)
> {code}
> This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a 
> double conversion on the partition column input, such that the IN 
> GenericUDFIn now receives b's value as a column type converted canonical 
> integer 7, as opposed to an as-is DB stored non-canonical value 07. 
> Subsequently the GenericUDFIn again up-converts the b's value to match its 
> argument's value types instead, making 7 (int) into a string "7". Then, "7" 
> is compared against "07" which naturally never matches.
> As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14593) Non-canonical integer partition columns do not work with IN operations

2016-08-19 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-14593:
---
Description: 
The below use-case no longer works (tested on a PostgresQL backed HMS using JDO 
as well as on a MySQL backed HMS with DirectSQL):

{code}
CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT);
ALTER TABLE foo ADD PARTITION (b='07', c='08');
LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', c='08');

-- Does not work if you provide a string IN variable:

SELECT a, c FROM foo WHERE b IN ('07');
(No rows selected)

-- Works if you provide it in integer forms or canonical integer strings:

SELECT a, c FROM foo WHERE b IN (07);
(1 row(s) selected)
SELECT a, c FROM foo WHERE b IN (7);
(1 row(s) selected)
SELECT a, c FROM foo WHERE b IN ('7');
(1 row(s) selected)
{code}

This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a 
double conversion on the partition column input, such that the IN GenericUDFIn 
now receives b's value as a column type converted canonical integer 7, as 
opposed to an as-is DB stored non-canonical value 07. Subsequently the 
GenericUDFIn again up-converts the b's value to match its argument's value 
types instead, making 7 (int) into a string "7". Then, "7" is compared against 
"07" which naturally never matches.

As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher.

  was:
The below use-case no longer works (tested on a PostgresQL backed HMS using 
JDO):

{code}
CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT);
ALTER TABLE foo ADD PARTITION (b='07', c='08');
LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', c='08');

-- Does not work if you provide a string IN variable:

SELECT a, c FROM foo WHERE b IN ('07');
(No rows selected)

-- Works if you provide it in integer forms:

SELECT a, c FROM foo WHERE b IN (07);
(1 row(s) selected)
SELECT a, c FROM foo WHERE b IN (7);
(1 row(s) selected)
{code}

This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a 
double conversion on the partition column input, such that the IN GenericUDFIn 
now receives b's value as a column type converted canonical integer 7, as 
opposed to an as-is DB stored non-canonical value 07. Subsequently the 
GenericUDFIn again up-converts the b's value to match its argument's value 
types instead, making 7 (int) into a string "7". Then, "7" is compared against 
"07" which naturally never matches.

As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher.


> Non-canonical integer partition columns do not work with IN operations
> --
>
> Key: HIVE-14593
> URL: https://issues.apache.org/jira/browse/HIVE-14593
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Harsh J
>
> The below use-case no longer works (tested on a PostgresQL backed HMS using 
> JDO as well as on a MySQL backed HMS with DirectSQL):
> {code}
> CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT);
> ALTER TABLE foo ADD PARTITION (b='07', c='08');
> LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', 
> c='08');
> -- Does not work if you provide a string IN variable:
> SELECT a, c FROM foo WHERE b IN ('07');
> (No rows selected)
> -- Works if you provide it in integer forms or canonical integer strings:
> SELECT a, c FROM foo WHERE b IN (07);
> (1 row(s) selected)
> SELECT a, c FROM foo WHERE b IN (7);
> (1 row(s) selected)
> SELECT a, c FROM foo WHERE b IN ('7');
> (1 row(s) selected)
> {code}
> This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a 
> double conversion on the partition column input, such that the IN 
> GenericUDFIn now receives b's value as a column type converted canonical 
> integer 7, as opposed to an as-is DB stored non-canonical value 07. 
> Subsequently the GenericUDFIn again up-converts the b's value to match its 
> argument's value types instead, making 7 (int) into a string "7". Then, "7" 
> is compared against "07" which naturally never matches.
> As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-05-08 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275975#comment-15275975
 ] 

Harsh J commented on HIVE-13704:


[~ashutoshc] - The opposite. HADOOP-10459 added the new method call in 
{{run()}}, so any Hadoop releases with that fix in will no longer execute 
DistCp correctly in Hive, because Hive has skipped calling {{run()}}.

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Priority: Critical
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11607) Export tables broken for data > 32 MB

2016-05-06 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274680#comment-15274680
 ] 

Harsh J commented on HIVE-11607:


This fix breaks DistCp copies in Hadoop 2.5.0+ (due to HADOOP-10459). See 
HIVE-13704.

> Export tables broken for data > 32 MB
> -
>
> Key: HIVE-11607
> URL: https://issues.apache.org/jira/browse/HIVE-11607
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11607.2.patch, HIVE-11607.3.patch, HIVE-11607.patch
>
>
> Broken for both hadoop-1 as well as hadoop-2 line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-05-06 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274004#comment-15274004
 ] 

Harsh J commented on HIVE-13704:


This is a problem only for the Hadoop23Shims DistCp callers, not for 
Hadoop20Shims, because branch-1's distcp2 in hadoop does not have such a 
state-setting function inside {{run()}}: 
https://github.com/apache/hadoop/blob/branch-1/src/tools/org/apache/hadoop/tools/distcp2/DistCp.java#L96-L114

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Priority: Critical
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2016-04-08 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232906#comment-15232906
 ] 

Harsh J commented on HIVE-13275:


Failing tests don't appear to be related.

> Add a toString method to BytesRefArrayWritable
> --
>
> Key: HIVE-13275
> URL: https://issues.apache.org/jira/browse/HIVE-13275
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8906) Hive 0.14.0 release depends on Tez and Calcite SNAPSHOT artifacts

2016-03-29 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HIVE-8906.
---
Resolution: Not A Problem

Resolving per Alan's comment above.

> Hive 0.14.0 release depends on Tez and Calcite SNAPSHOT artifacts
> -
>
> Key: HIVE-8906
> URL: https://issues.apache.org/jira/browse/HIVE-8906
> Project: Hive
>  Issue Type: Bug
>Reporter: Carl Steinbach
>
> The Hive 0.14.0 release depends on SNAPSHOT versions of tez-0.5.2 and 
> calcite-0.9.2. I believe this violates Apache release policy (can't find the 
> reference, but I seem to remember this being a problem with HCatalog before 
> the merger), and it implies that the folks who tested the release weren't 
> necessarily testing the same thing. It also means that people who try to 
> build Hive using the 0.14.0 src release will encounter errors unless they 
> configure Maven to pull artifacts from the snapshot repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2016-03-13 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192388#comment-15192388
 ] 

Harsh J commented on HIVE-13275:


A couple of caveats though:
# While RCFile may natively support columns with newline characters in their 
data, the toString representation for use in Hadoop Streaming will likely not 
work well with that (cause of text format)
# If the bytes are encoded in any form other than simple text representations 
in future, such as in Avro, Protobuf, etc., the toString representation will 
not be directly useful anymore

> Add a toString method to BytesRefArrayWritable
> --
>
> Key: HIVE-13275
> URL: https://issues.apache.org/jira/browse/HIVE-13275
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2016-03-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-13275:
---
Component/s: File Formats

> Add a toString method to BytesRefArrayWritable
> --
>
> Key: HIVE-13275
> URL: https://issues.apache.org/jira/browse/HIVE-13275
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2016-03-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-13275:
---
Status: Patch Available  (was: Open)

> Add a toString method to BytesRefArrayWritable
> --
>
> Key: HIVE-13275
> URL: https://issues.apache.org/jira/browse/HIVE-13275
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2016-03-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-13275:
---
Attachment: HIVE-13275.000.patch

> Add a toString method to BytesRefArrayWritable
> --
>
> Key: HIVE-13275
> URL: https://issues.apache.org/jira/browse/HIVE-13275
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-3335) Thousand of CLOSE_WAIT socket when we using SymbolicInputFormat

2016-02-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HIVE-3335.
---
Resolution: Duplicate

> Thousand of CLOSE_WAIT socket when we using SymbolicInputFormat
> ---
>
> Key: HIVE-3335
> URL: https://issues.apache.org/jira/browse/HIVE-3335
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 0.8.1
> Environment:  CentOS 5.8 x64
>  CDH3u4
>hadoop-0.20-0.20.2+923.256-1
>
> hadoop-0.20-{namenode,secondarynamenode,jobtracker,tasktracker,datanode}-0.20.2+923.256-1
>hadoop-0.20-conf-pseudo-0.20.2+923.256-1(but same error was
> occurred on not pseudo env)
>  apache hive-0.8.1(but same error was occurred on hive 0.9)
>Reporter: Yuki Yoi
> Attachments: HIVE-3335.patch
>
>
> Procedure for reproduction:
>  1. Set up hadoop
>  2. Prepare data file and link.txt:
> data:
>   $ hadoop fs -cat /path/to/data/2012-07-01/20120701.csv
>   1, 20120701 00:00:00
>   2, 20120701 00:00:01
>   3, 20120701 01:12:45
> link.txt
>   $ cat link.txt
>/path/to/data/2012-07-01//*
>  2. On hive, create table like below:
>CREATE TABLE user_logs(id INT, created_at STRING)
>row format delimited fields terminated by ',' lines terminated by '\n'
>stored as inputformat 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
>outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
>  3. Put link.txt to /user/hive/warehouse/user_logs
>$ sudo -u hdfs hadoop fs -put link.txt  /user/hive/warehouse/user_logs
>  4. Open another session(A session), and watch socket,
>$ netstat -a | grep CLOSE_WAIT
> tcp1  0 localhost:48121 localhost:50010
>  CLOSE_WAIT
> tcp1  0 localhost:48124 localhost:50010
>  CLOSE_WAIT
>$
>  5. Return to hive session, execute this,
>$ select * from user_logs;
>  6. Return to A session, watch socket again,
>$ netstat -a | grep CLOSE_WAIT
>tcp1  0 localhost:48121 localhost:50010
> CLOSE_WAIT
>tcp1  0 localhost:48124 localhost:50010
> CLOSE_WAIT
>tcp1  0 localhost:48166 localhost:50010
> CLOSE_WAIT
>  If you makes any partitions, you'll watch unclosed socket whose count
> equals partitions by once.
> I think that this problem maybe is caused by this point:
>   At 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java,
>   line 66. BufferedReader was opened, but it doesn't closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3335) Thousand of CLOSE_WAIT socket when we using SymbolicInputFormat

2016-02-26 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168697#comment-15168697
 ] 

Harsh J commented on HIVE-3335:
---

This was fixed via HIVE-3480

> Thousand of CLOSE_WAIT socket when we using SymbolicInputFormat
> ---
>
> Key: HIVE-3335
> URL: https://issues.apache.org/jira/browse/HIVE-3335
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 0.8.1
> Environment:  CentOS 5.8 x64
>  CDH3u4
>hadoop-0.20-0.20.2+923.256-1
>
> hadoop-0.20-{namenode,secondarynamenode,jobtracker,tasktracker,datanode}-0.20.2+923.256-1
>hadoop-0.20-conf-pseudo-0.20.2+923.256-1(but same error was
> occurred on not pseudo env)
>  apache hive-0.8.1(but same error was occurred on hive 0.9)
>Reporter: Yuki Yoi
> Attachments: HIVE-3335.patch
>
>
> Procedure for reproduction:
>  1. Set up hadoop
>  2. Prepare data file and link.txt:
> data:
>   $ hadoop fs -cat /path/to/data/2012-07-01/20120701.csv
>   1, 20120701 00:00:00
>   2, 20120701 00:00:01
>   3, 20120701 01:12:45
> link.txt
>   $ cat link.txt
>/path/to/data/2012-07-01//*
>  2. On hive, create table like below:
>CREATE TABLE user_logs(id INT, created_at STRING)
>row format delimited fields terminated by ',' lines terminated by '\n'
>stored as inputformat 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
>outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
>  3. Put link.txt to /user/hive/warehouse/user_logs
>$ sudo -u hdfs hadoop fs -put link.txt  /user/hive/warehouse/user_logs
>  4. Open another session(A session), and watch socket,
>$ netstat -a | grep CLOSE_WAIT
> tcp1  0 localhost:48121 localhost:50010
>  CLOSE_WAIT
> tcp1  0 localhost:48124 localhost:50010
>  CLOSE_WAIT
>$
>  5. Return to hive session, execute this,
>$ select * from user_logs;
>  6. Return to A session, watch socket again,
>$ netstat -a | grep CLOSE_WAIT
>tcp1  0 localhost:48121 localhost:50010
> CLOSE_WAIT
>tcp1  0 localhost:48124 localhost:50010
> CLOSE_WAIT
>tcp1  0 localhost:48166 localhost:50010
> CLOSE_WAIT
>  If you makes any partitions, you'll watch unclosed socket whose count
> equals partitions by once.
> I think that this problem maybe is caused by this point:
>   At 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java,
>   line 66. BufferedReader was opened, but it doesn't closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11325) Infinite loop in HiveHFileOutputFormat

2015-10-15 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HIVE-11325:
--

Assignee: Harsh J

> Infinite loop in HiveHFileOutputFormat
> --
>
> Key: HIVE-11325
> URL: https://issues.apache.org/jira/browse/HIVE-11325
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Assignee: Harsh J
> Attachments: HIVE-11325.patch
>
>
> No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
> regularly in Hive builds, but here's the gist of the issue:
> The condition at 
> https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
>  indicates that we will infinitely loop until we find a file whose last path 
> component (the name) is equal to the column family name.
> In execution, however, the iteration enters an actual infinite loop cause the 
> file we end up considering as the srcDir name, is actually the region file, 
> whose name will never match the family name.
> This is an example of the IPC the listing loop of a 100% progress task gets 
> stuck in:
> {code}
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
> "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c"
>  startAfter: "" needLocation: false}
> 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
> org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive sending #510346
> 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
> Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
> value #510346
> 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> Call: getListing took 0ms
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
> partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 
> 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 
> access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
> 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
> {code}
> The path we are getting out of the listing results is 
> {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}},
>  but instead of checking the path's parent {{family}} we're instead looping 
> infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
> cause it does not match {{family}}.
> It stays in the infinite loop therefore, until the MR framework kills it away 
> due to an idle task timeout (and then since the subsequent task attempts fail 
> outright, the job fails).
> While doing a {{getPath().getParent()}} will resolve that, is that infinite 
> loop even necessary? Especially given the fact that we throw exceptions if 
> there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11325) Infinite loop in HiveHFileOutputFormat

2015-07-21 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-11325:
---
Attachment: HIVE-11325.patch

> Infinite loop in HiveHFileOutputFormat
> --
>
> Key: HIVE-11325
> URL: https://issues.apache.org/jira/browse/HIVE-11325
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.0.0
>Reporter: Harsh J
> Attachments: HIVE-11325.patch
>
>
> No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
> regularly in Hive builds, but here's the gist of the issue:
> The condition at 
> https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
>  indicates that we will infinitely loop until we find a file whose last path 
> component (the name) is equal to the column family name.
> In execution, however, the iteration enters an actual infinite loop cause the 
> file we end up considering as the srcDir name, is actually the region file, 
> whose name will never match the family name.
> This is an example of the IPC the listing loop of a 100% progress task gets 
> stuck in:
> {code}
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
> "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c"
>  startAfter: "" needLocation: false}
> 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
> org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive sending #510346
> 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
> Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
> value #510346
> 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> Call: getListing took 0ms
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
> partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 
> 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 
> access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
> 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
> {code}
> The path we are getting out of the listing results is 
> {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}},
>  but instead of checking the path's parent {{family}} we're instead looping 
> infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
> cause it does not match {{family}}.
> It stays in the infinite loop therefore, until the MR framework kills it away 
> due to an idle task timeout (and then since the subsequent task attempts fail 
> outright, the job fails).
> While doing a {{getPath().getParent()}} will resolve that, is that infinite 
> loop even necessary? Especially given the fact that we throw exceptions if 
> there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11325) Infinite loop in HiveHFileOutputFormat

2015-07-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634795#comment-14634795
 ] 

Harsh J commented on HIVE-11325:


So this happens if the family name is not provided correctly via the {{set 
hfile.family.path=/path/…/familyname;}} statement. We should avoid looping over 
if we've reached a file instead of a directory, to prevent the hang. We could 
throw an error instead.

> Infinite loop in HiveHFileOutputFormat
> --
>
> Key: HIVE-11325
> URL: https://issues.apache.org/jira/browse/HIVE-11325
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.0.0
>Reporter: Harsh J
>
> No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
> regularly in Hive builds, but here's the gist of the issue:
> The condition at 
> https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
>  indicates that we will infinitely loop until we find a file whose last path 
> component (the name) is equal to the column family name.
> In execution, however, the iteration enters an actual infinite loop cause the 
> file we end up considering as the srcDir name, is actually the region file, 
> whose name will never match the family name.
> This is an example of the IPC the listing loop of a 100% progress task gets 
> stuck in:
> {code}
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
> "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c"
>  startAfter: "" needLocation: false}
> 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
> org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive sending #510346
> 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
> Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
> value #510346
> 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> Call: getListing took 0ms
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
> partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 
> 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 
> access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
> 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
> {code}
> The path we are getting out of the listing results is 
> {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}},
>  but instead of checking the path's parent {{family}} we're instead looping 
> infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
> cause it does not match {{family}}.
> It stays in the infinite loop therefore, until the MR framework kills it away 
> due to an idle task timeout (and then since the subsequent task attempts fail 
> outright, the job fails).
> While doing a {{getPath().getParent()}} will resolve that, is that infinite 
> loop even necessary? Especially given the fact that we throw exceptions if 
> there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11325) Infinite loop in HiveHFileOutputFormat

2015-07-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634774#comment-14634774
 ] 

Harsh J commented on HIVE-11325:


I missed the srcDir declare, which'd explain the loop (we're walking). I'm 
checking why it doesn't abort at the family directory.

> Infinite loop in HiveHFileOutputFormat
> --
>
> Key: HIVE-11325
> URL: https://issues.apache.org/jira/browse/HIVE-11325
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.0.0
>Reporter: Harsh J
>
> No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
> regularly in Hive builds, but here's the gist of the issue:
> The condition at 
> https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
>  indicates that we will infinitely loop until we find a file whose last path 
> component (the name) is equal to the column family name.
> In execution, however, the iteration enters an actual infinite loop cause the 
> file we end up considering as the srcDir name, is actually the region file, 
> whose name will never match the family name.
> This is an example of the IPC the listing loop of a 100% progress task gets 
> stuck in:
> {code}
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
> "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c"
>  startAfter: "" needLocation: false}
> 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
> org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive sending #510346
> 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
> cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
> Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
> value #510346
> 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> Call: getListing took 0ms
> 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
> 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
> partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 
> 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 
> access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
> 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
> {code}
> The path we are getting out of the listing results is 
> {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}},
>  but instead of checking the path's parent {{family}} we're instead looping 
> infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
> cause it does not match {{family}}.
> It stays in the infinite loop therefore, until the MR framework kills it away 
> due to an idle task timeout (and then since the subsequent task attempts fail 
> outright, the job fails).
> While doing a {{getPath().getParent()}} will resolve that, is that infinite 
> loop even necessary? Especially given the fact that we throw exceptions if 
> there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2015-04-24 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512276#comment-14512276
 ] 

Harsh J commented on HIVE-9534:
---

Applying a similar SQL in PostgreSQL or Impala returns an error of the form 
"DISTINCT is not implemented for window functions". Hive, unless it did add 
proper support for distinct in such a context, should likely output the same 
error (if not a bug-fix).

> incorrect result set for query that projects a windowed aggregate
> -
>
> Key: HIVE-9534
> URL: https://issues.apache.org/jira/browse/HIVE-9534
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Reporter: N Campbell
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-16 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363733#comment-14363733
 ] 

Harsh J commented on HIVE-9870:
---

[~vgumashta] - Just checking to see if any other changes are required or if 
this can be committed?

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch, HIVE-9870.patch, HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: HIVE-9870.patch

Sorry, missed the new try-catch block change in the previous attachment. Should 
work this time.

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch, HIVE-9870.patch, HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: HIVE-9870.patch

Thanks [~vgumashta], I've moved it outside of that try-catch block.

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch, HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: (was: HIVE-9870.patch)

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: HIVE-9870.patch

A second commit got added accidentally in previous patch. Removed it.

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: HIVE-9870.patch

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: (was: HIVE-9870.patch)

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-9870:
--
Attachment: HIVE-9870.patch

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)