Re: Error when #2 Step: Redistribute Flat Hive Table - File does not exist

2016-12-21 Thread ShaoFeng Shi
BTW: KYLIN-2165 will handle this in a more elegant way, it will be released
in next version;

2016-12-22 9:21 GMT+08:00 ShaoFeng Shi :

> "java.io.FileNotFoundException: File does not exist:
> /young/kylin_test/kylin_metadata_test/kylin-678266c0-ba0e-
> 48b4-bdb5-6e578320375a/row_count/00_0"
>
> Seems be the issue in https://issues.apache.org/jira/browse/KYLIN-2159
> It should be fixed in 1.6.0; could you please check whether there is any
> file in the folder?
>
> hadoop fs -ls /young/kylin_test/kylin_metadata_test/kylin-678266c0-
> ba0e-48b4-bdb5-6e578320375a/row_count/
>
> 2016-12-20 15:23 GMT+08:00 Alberto Ramón :
>
>> other idea:
>> Can be a problem with permissions?: the user that execute Kylin can't
>> read data generated by YARN
>> check if Kylin user can read your folder  /young/kylin_test/
>> Which Hadoop user are executing Kylin?
>>
>> (no more ideas, Good Luck)
>>
>> 2016-12-20 7:51 GMT+01:00 雨日听风 <491245...@qq.com>:
>>
>>> Thank you!
>>> We checked the yarn and hard disk. But not found any error. Hard disk
>>> space and memory and so on is working well.
>>> Last time its error code was "unknownhost clusterB",now in new server
>>> env it cant find clusterB(hbase only). but cant find rowCount file.
>>> ===
>>> the follow command runs ok:
>>> hdfs dfs -mkdir /young/kylin_test/kylin_metadata_nokia/kylin
>>> -678c15ba-5375-4f80-831e-1ae0af8ed576/row_count/tmp
>>> And "ls" cant find file "00_0"  which it said "file does not exist".
>>>
>>> -- 原始邮件 --
>>> *发件人:* "Alberto Ramón";;
>>> *发送时间:* 2016年12月19日(星期一) 晚上9:13
>>> *收件人:* "user";
>>> *主题:* Re: Error when #2 Step: Redistribute Flat Hive Table - File does
>>> not exist
>>>
>>> i think i had this error last nigth  :)
>>> (go to yarn to find detailed error & find on internet)
>>> in my case was free space less than 10% of hard disk. Check this please
>>>
>>> El 19/12/2016 11:35, "雨日听风" <491245...@qq.com> escribió:
>>>
 When I build a cube in kylin1.6, I get error in step #2: Redistribute
 Flat Hive Table

 Please help! Thank you very much!

 env: kylin1.6 is in a independent server, and have 2 other server
 cluster: clusterA(hive only) and clusterB(hbase only).
 Error is:

 2016-12-19 10:28:00,641 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Compute row count of flat hive
 table, cmd:
 2016-12-19 10:28:00,642 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : hive -e "USE boco;
 SET dfs.replication=2;
 SET hive.exec.compress.output=true;
 SET hive.auto.convert.join.noconditionaltask=true;
 SET hive.auto.convert.join.noconditionaltask.size=1;
 SET mapreduce.output.fileoutputformat.compress.type=BLOCK;
 SET mapreduce.job.split.metainfo.maxsize=-1;
 SET mapreduce.job.queuename=young;
 SET tez.queue.name=young;

 set hive.exec.compress.output=false;

 set hive.exec.compress.output=false;
 INSERT OVERWRITE DIRECTORY '/young/kylin_test/kylin_metad
 ata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count' SELECT
 count(*) FROM kylin_intermediate_hbase_in_te
 stCluster_CUBE_f9468805_eabf_4b54_bf2b_182e4c86214a;

 "
 2016-12-19 10:28:03,277 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : WARNING: Use "yarn jar" to launch
 YARN applications.
 2016-12-19 10:28:04,444 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 :
 2016-12-19 10:28:04,445 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Logging initialized using
 configuration in file:/etc/hive/conf/hive-log4j.properties
 2016-12-19 10:28:14,700 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : OK
 2016-12-19 10:28:14,703 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Time taken: 0.935 seconds
 2016-12-19 10:28:15,559 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Query ID =
 young_20161219102814_a7104fd4-ba83-47fc-ac0b-0c9bef4e1969
 2016-12-19 10:28:15,560 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Total jobs = 1
 2016-12-19 10:28:15,575 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Launching Job 1 out of 1
 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 :
 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 :
 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Status: Running (Executing on YARN
 cluster with App id application_1473415773736_1063281)
 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 :
 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
 execution.AbstractExecutable:36 : Map 1: -/- Reducer 2: 0/1
 2016-12-19 10:28:23,307 INFO  [pool-8-thread-7]
 

Re: Error when #2 Step: Redistribute Flat Hive Table - File does not exist

2016-12-21 Thread ShaoFeng Shi
"java.io.FileNotFoundException: File does not exist:
/young/kylin_test/kylin_metadata_test/kylin-678266c0-
ba0e-48b4-bdb5-6e578320375a/row_count/00_0"

Seems be the issue in https://issues.apache.org/jira/browse/KYLIN-2159
It should be fixed in 1.6.0; could you please check whether there is any
file in the folder?

hadoop fs -ls
/young/kylin_test/kylin_metadata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count/

2016-12-20 15:23 GMT+08:00 Alberto Ramón :

> other idea:
> Can be a problem with permissions?: the user that execute Kylin can't
> read data generated by YARN
> check if Kylin user can read your folder  /young/kylin_test/
> Which Hadoop user are executing Kylin?
>
> (no more ideas, Good Luck)
>
> 2016-12-20 7:51 GMT+01:00 雨日听风 <491245...@qq.com>:
>
>> Thank you!
>> We checked the yarn and hard disk. But not found any error. Hard disk
>> space and memory and so on is working well.
>> Last time its error code was "unknownhost clusterB",now in new server env
>> it cant find clusterB(hbase only). but cant find rowCount file.
>> ===
>> the follow command runs ok:
>> hdfs dfs -mkdir /young/kylin_test/kylin_metadata_nokia/kylin
>> -678c15ba-5375-4f80-831e-1ae0af8ed576/row_count/tmp
>> And "ls" cant find file "00_0"  which it said "file does not exist".
>>
>> -- 原始邮件 --
>> *发件人:* "Alberto Ramón";;
>> *发送时间:* 2016年12月19日(星期一) 晚上9:13
>> *收件人:* "user";
>> *主题:* Re: Error when #2 Step: Redistribute Flat Hive Table - File does
>> not exist
>>
>> i think i had this error last nigth  :)
>> (go to yarn to find detailed error & find on internet)
>> in my case was free space less than 10% of hard disk. Check this please
>>
>> El 19/12/2016 11:35, "雨日听风" <491245...@qq.com> escribió:
>>
>>> When I build a cube in kylin1.6, I get error in step #2: Redistribute
>>> Flat Hive Table
>>>
>>> Please help! Thank you very much!
>>>
>>> env: kylin1.6 is in a independent server, and have 2 other server
>>> cluster: clusterA(hive only) and clusterB(hbase only).
>>> Error is:
>>>
>>> 2016-12-19 10:28:00,641 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Compute row count of flat hive table,
>>> cmd:
>>> 2016-12-19 10:28:00,642 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : hive -e "USE boco;
>>> SET dfs.replication=2;
>>> SET hive.exec.compress.output=true;
>>> SET hive.auto.convert.join.noconditionaltask=true;
>>> SET hive.auto.convert.join.noconditionaltask.size=1;
>>> SET mapreduce.output.fileoutputformat.compress.type=BLOCK;
>>> SET mapreduce.job.split.metainfo.maxsize=-1;
>>> SET mapreduce.job.queuename=young;
>>> SET tez.queue.name=young;
>>>
>>> set hive.exec.compress.output=false;
>>>
>>> set hive.exec.compress.output=false;
>>> INSERT OVERWRITE DIRECTORY '/young/kylin_test/kylin_metad
>>> ata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count' SELECT
>>> count(*) FROM kylin_intermediate_hbase_in_te
>>> stCluster_CUBE_f9468805_eabf_4b54_bf2b_182e4c86214a;
>>>
>>> "
>>> 2016-12-19 10:28:03,277 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : WARNING: Use "yarn jar" to launch
>>> YARN applications.
>>> 2016-12-19 10:28:04,444 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 :
>>> 2016-12-19 10:28:04,445 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Logging initialized using
>>> configuration in file:/etc/hive/conf/hive-log4j.properties
>>> 2016-12-19 10:28:14,700 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : OK
>>> 2016-12-19 10:28:14,703 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Time taken: 0.935 seconds
>>> 2016-12-19 10:28:15,559 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Query ID =
>>> young_20161219102814_a7104fd4-ba83-47fc-ac0b-0c9bef4e1969
>>> 2016-12-19 10:28:15,560 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Total jobs = 1
>>> 2016-12-19 10:28:15,575 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Launching Job 1 out of 1
>>> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 :
>>> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 :
>>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Status: Running (Executing on YARN
>>> cluster with App id application_1473415773736_1063281)
>>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 :
>>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Map 1: -/- Reducer 2: 0/1
>>> 2016-12-19 10:28:23,307 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
>>> 2016-12-19 10:28:26,363 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
>>> 2016-12-19 10:28:26,567 INFO  [pool-8-thread-7]
>>> execution.AbstractExecutable:36 : Map 1: 0(+1)/2 Reducer 2: 

Re: Joint and Order in RowKey

2016-12-21 Thread Alberto Ramón
yes, but I understand that if (ID , TXT) are Joint Dim, In drag and drop
you should see together like one Dim

2016-12-21 11:24 GMT+01:00 Li Yang :

> Maybe I didn't get the question. But the order of rowkey is adjustable by
> drag then move up and down...
>
> On Tue, Dec 20, 2016 at 2:46 AM, Alberto Ramón 
> wrote:
>
>> If we have these columns:
>> [image: Imágenes integradas 1]
>>
>> With There Joints:
>> [image: Imágenes integradas 3]
>>
>> *Why I cant  order these columns individually?*  (Text , Id) now must be
>> a tupple
>> [image: Imágenes integradas 4]
>>
>> (I accept suggestion about order, anyo=year)
>>
>
>


Re: Joint and Order in RowKey

2016-12-21 Thread Li Yang
Maybe I didn't get the question. But the order of rowkey is adjustable by
drag then move up and down...

On Tue, Dec 20, 2016 at 2:46 AM, Alberto Ramón 
wrote:

> If we have these columns:
> [image: Imágenes integradas 1]
>
> With There Joints:
> [image: Imágenes integradas 3]
>
> *Why I cant  order these columns individually?*  (Text , Id) now must be
> a tupple
> [image: Imágenes integradas 4]
>
> (I accept suggestion about order, anyo=year)
>


Re: How to workaround with columns with NULL value?

2016-12-21 Thread ShaoFeng Shi
Thanks Alberto on question 1, that's a known issue, we can prioritize it if
more people comment on it.

About 2, if the cardinality of that column isn't high, you can add it to
dimension to get this capacity. If it is high, suggest you do a conversion,
e.g, 0, 100, 200,...; you can do this easily with hive view;

About 3, could you please open a JIRA with the error trace? Thanks;

2016-12-21 15:26 GMT+08:00 Alberto Ramón :

> about 1º point: In Kylin 2049
>  there is a commet
> 
> of Shaofeng SHI
>
> 2016-12-21 6:32 GMT+01:00 Da Tong :
>
>> Hi, all
>>
>> I am using kylin 1.6.0. I have met three problem:
>>
>> 1. in one of my Metrics, some of the values are NULL, when I tried to
>> calculate the average of the column, the COUNT function will not filter out
>> NULL value, which means the average result is biased. One solution I found
>> is using another column to mark whether the value is NULL or not, but there
>> are hundreds of columns like this. I don't think adding another hundreds of
>> mark column as dimensions is a good way. Any suggestions about this
>> situation?
>>
>> 2. I need to do filter using WHERE clause in some metrics columns, such
>> as count rows that having value of one field over 100. It seems that I have
>> to add new columns such as A_FIELD_OVER_100 to achieve this. But what if
>> the *100* is a variable? User of our system need to filter out result based
>> on metrics value, should I add metrics into dimensions? Is this requirement
>> an uncommon case?
>>
>> 3. It seems that querying all-null columns issue is fixed in this issue
>>  (Kylin 1527). But I
>> still got NullPointerError from RawMesureType.valueOf method. I just want
>> to make sure that Kylin support columns with all null values, right?
>>
>> Any suggestion is welcome. Thank you.
>> --
>> TONG, Da / 佟达
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋