Re: Problem in finding the largest value of an indexed column

2015-07-10 Thread James Taylor
Sounds like something else is going wrong. Can you adapt your test by
setting the MAX_FILESIZE very low for your table (so that it splits after 4
or 5 rows are added) and package it up as a unit test?

On Thu, Jul 9, 2015 at 1:44 PM, Yufan Liu yli...@kent.edu wrote:

 Just got a chance to revisit this issue: I have rebuilt the index and it
 still returns the unexpected result. By using the test case, I tried to
 insert enough rows to make it auto-split and it reproduces the problem too.
 It seems it still has trouble returning last row sorted by first component
 of primary key on split tables. Maybe there is another issue than
 PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98
 branch which includes the patch of PHOENIX-2096.

 2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org:

 On further investigation, I believe it should have been working before. I
 did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
 would only prevent a merge sort when one is not required (basically
 improving performance).

 Maybe your index is invalid? You can try rebuilding with this command:
 https://phoenix.apache.org/language/index.html#alter_index

 On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote:

 The query on test dataset is returning the expected result with the
 patch. But on the original dataset (10million rows, 6 regions), it still
 return the same unexpected result, I will dig more into this. Thank you,
 James!

 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu:

 Sure, let me have a try

 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org:

 Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
 Would you mind confirming that this fixes the issue you're seeing?

 James

 On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after
 table got split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp))
 COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows,
 when the timestamp_index table has one region, the query returns 
 correct
 result: 144048443, but when I manually split it into 4 regions (use
 hbase tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu
 wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY
 timestamp DESC NULLS LAST LIMIT 1. But it still returns the same
 unexpected result. There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use
 NULLS LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu
 wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is 
 Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the
 advice I could give you. There are people here far more 
 experienced than I
 am who should be able to give you deeper advice. Have a great 
 weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an
 indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT
 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER 
 FILTER BY
 FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as 
 expected. For
 the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL 
 SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW 
 LIMIT which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is
 being used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 

RE: Permissions Question

2015-07-10 Thread Riesland, Zack
Gabriel,

This is EXACTLY what I needed. Thanks!

From: Gabriel Reid [mailto:gabriel.r...@gmail.com]
Sent: Wednesday, July 08, 2015 1:40 AM
To: user@phoenix.apache.org
Subject: Re: Permissions Question

Hi Zack,

There are two options that I know of, and I think that both of them should work.

First is that you can supply a custom output directory to the bulk loader using 
the -o parameter (see http://phoenix.apache.org/bulk_dataload.html). In this 
way you can ensure that the output directory doesn't automatically change every 
time you run the jar.

The second is that you should be able to supply the parameter 
-Dfs.permissions.umask-mode=000 to the bulk load tool (before any other 
parameters). This manipulates the umask with which the files will be written, 
making the output readable and writable by everyone (which then allows hbase to 
move it under its own directory structure).

Assuming that at least one of these works for you (or even if they don't), 
could you add a ticket in the Phoenix JIRA 
(https://issues.apache.org/jira/browse/PHOENIX) so that we can track getting a 
more structural fix for this issue?

- Gabriel


On Tue, Jul 7, 2015 at 4:53 PM Riesland, Zack 
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
Thanks Krishna,

The hfiles are stored in, for example, 
/tmp/daa6119d-f49e-485e-a6fe-1405d9c3f2a4/structure based on table name

‘tmp’ is owned by ‘hdfs’ in group ‘hdfs’.

‘daa6119d-f49e-485e-a6fe-1405d9c3f2a4’ is owned by my script user (‘user1’ for 
example) in group ‘hdfs’.

I cannot run the script as ‘hbase’, and the name of the folder 
(‘daa6119d-f49e-485e-a6fe-1405d9c3f2a4’ in this case) will change each time I 
run the jar, so explicitly doing a chown on that folder won’t help.

Do you know what change I need to make to ‘user1’ so that hfiles created by him 
will write to hbase?

From: Krishna [mailto:research...@gmail.commailto:research...@gmail.com]
Sent: Monday, July 06, 2015 3:11 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Permissions Question

The owner of the directory containing HFiles should be 'hbase' user and 
ownership can set using 'chown' command.

On Mon, Jul 6, 2015 at 7:12 AM, Riesland, Zack 
zack.riesl...@sensus.commailto:zack.riesl...@sensus.com wrote:
I’ve been running CsvBulkLoader as ‘hbase’ and that has worked well.

But I now need to integrate with some scripts that will be run as another user.

When I run under a different account, the CsvBulkLoader runs and creates the 
HFiles, but then encounters permission issues attempting to write the data to 
HBase.

Can someone point me in the right direction for solving this?

How can I give ‘hbase’ write permissions to a different user?

Thanks!