Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-19 Thread Gopal Vijayaraghavan
> . I didn't see data skew for that reducer. It has similar amount of 
> REDUCE_INPUT_RECORDS as other reducers.
…
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for 
> join key [4092813312923569]


The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is relevant.

 

The row containers being spilled to disk means that at least 1 key in the join 
has > 1 values.

If you have Tez, this comes up when you run the SkewAnalyzer.

https://github.com/apache/tez/blob/master/tez-tools/analyzers/job-analyzer/src/main/java/org/apache/tez/analyzer/plugins/SkewAnalyzer.java#L41

 

Cheers,

Gopal



Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-19 Thread Daniel Bruce
Hi Feng,

I've seen exactly same problem with one of my queries. There is one reducer
hanging forever. I didn't see data skew for that reducer. It has similar
amount of REDUCE_INPUT_RECORDS as other reducers. But this number stopped
changing any more and just hanging..

Does anybody else know what's happening there?

Daniel
>From "Feng Yuan" 
Subject In reduce task,i have a join operation ,and i found
"org.apache.hadoop.mapred.FileInputFormat: Total input paths to process :
1" cast much long
Date Mon, 10 Apr 2017 06:51:26 GMT

The log is :
2017-04-10 01:34:22,375 INFO [main]
org.apache.hadoop.mapred.FileInputFormat: Total input
paths to process : 1
2017-04-10 01:36:32,551 INFO [main] ExecReducer: ExecReducer:
processing 200 rows: used
memory = 101789096
2017-04-10 01:37:03,284 INFO [main]
org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table
0 has 1000 rows for join key [4092813312923569]
2017-04-10 01:37:03,286 INFO [main]
org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table
0 has 2000 rows for join key [4092813312923569]
2017-04-10 01:37:03,291 INFO [main]
org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table
0 has 4000 rows for join key [4092813312923569]
2017-04-10 01:37:03,301 INFO [main]
org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table
0 has 8000 rows for join key [4092813312923569]
2017-04-10 01:37:03,379 INFO [main]
org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
RowContainer created temp file
/data9/hadoop/local/usercache/xx/appcache/application_1482905245692_364/container_1482905245692_364_01_000330/tmp/hive-rowcontainer5366426093735775537/RowContainer3525630608978801813.tmp
2017-04-10 01:37:04,559 INFO [main]
org.apache.hadoop.mapred.FileInputFormat: Total input
paths to process : 1
2017-04-10 07:17:47,584 INFO [main]
org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
RowContainer created temp file
/data9/hadoop/local/usercache/xx/appcache/application_1482905245692_364/container_1482905245692_364_01_000330/tmp/hive-rowcontainer8292833982081568523/RowContainer734749216866467280.tmp
2017-04-10 07:17:47,775 INFO [main]
org.apache.hadoop.mapred.FileInputFormat: Total input
paths to process : 1
2017-04-10 07:21:57,890 INFO [main]
org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
RowContainer created temp file
/data9/hadoop/local/usercache/xx/appcache/application_1482905245692_364/container_1482905245692_364_01_000330/tmp/hive-rowcontainer3072958941479299308/RowContainer1838954978169271208.tmp
2017-04-10 07:21:58,119 INFO [main]
org.apache.hadoop.mapred.FileInputFormat: Total input
paths to process : 1
2017-04-10 07:24:07,796 INFO [main] org.apach
=
what i know is there is a join operation,but what did
"org.apache.hadoop.mapred.FileInputFormat:
Total input paths to process : 1" mean?
is there some data it need to read? from hdfs?More critical why it is so slow?
from 2017-04-10 01:37:04 to 2017-04-10 07:17:47


Re: Find a possible problem in wiki content

2017-10-19 Thread Andrew Sherman
Hi 孙志禹,

I think the text is referring to the javadoc for the Matcher class in java
.

I added a lkink on
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF which
may have been the page you meant.

-Andrew

On Thu, Oct 19, 2017 at 5:25 AM, 孙志禹  wrote:

> In the link https://cwiki.apache.org/confluence/display/Hive/
> LanguageManual+Select where there is an explanation of the function
> *regexp_extract* (in the picture below), there is a reference to the html 
> *docs/api/java/util/regex/Matcher.html
> *which can not be opened directly. It seems something wrong, and if it
> isn't, I want to ask for the right way to open that html.
> Thanks.
> [image: Inline image]
>


Find a possible problem in wiki content

2017-10-19 Thread 孙志禹
In the link 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select where 
there is an explanation of the function regexp_extract (in the picture below), 
there is a reference to the html docs/api/java/util/regex/Matcher.html which 
can not be opened directly. It seems something wrong, and if it isn't, I want 
to ask for the right way to open that html.Thanks.


Re: can i have write privilege to Hive Wiki

2017-10-19 Thread Lefty Leverenz
Done.  Welcome to the Hive wiki team, Slim!

-- Lefty



On Wed, Oct 18, 2017 at 11:04 AM, Slim Bouguerra 
wrote:

> User name is bslim and address is bs...@apache.org
> thanks
>
>