Re: Full text query in Phoenix

2016-09-19 Thread Cheyenne Forbes
Hi James,

Thanks a lot, I found a link showing how to integrate hbase with lucene
https://itpeernetwork.intel.com/idh-hbase-lucene-integration/


Re: Using COUNT() with columns that don't use COUNT() when the table is join fails

2016-09-19 Thread Steve Terrell
I'm not an expert in traditional SQL or in Phoenix SQL, but my best guess
is "probably not".

But I'm curious as to why you would like to avoid the group by or the list
of columns.  I know it looks very wordy, but are there any technical
reasons?  In my experience SQL is hard to read by human eyes by nature, so
I just get used to it.

On Mon, Sep 19, 2016 at 10:06 AM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> Hi steve,
>
> Thank you, it works when I add group by, can I avoid using group by or
> avoid adding all my columns to the group by if I have 10 columns being
> queried?
>


Re: Using COUNT() with columns that don't use COUNT() when the table is join fails

2016-09-19 Thread Cheyenne Forbes
I was wondering because it seems extra wordy


Re: Using COUNT() with columns that don't use COUNT() when the table is join fails

2016-09-19 Thread Steve Terrell
Hi!  I think you need something like
group by u.first_name
on the end.  Best guess.  :)

On Sun, Sep 18, 2016 at 11:03 PM, Cheyenne Forbes <
cheyenne.osanu.for...@gmail.com> wrote:

> this query fails:
>
> SELECT COUNT(fr.friend_1), u.first_name
>>
>> FROM users AS u
>>
>> LEFT JOIN friends AS fr ON u.id = fr.friend_2
>>
>>
> with:
>
> SQLException: ERROR 1018 (42Y27): Aggregate may not contain columns not in
>> GROUP BY. U.FIRST_NAME
>>
>
> TABLES:
>
> users table with these columns ( id, first_name, last_name )
>
>
> friends table with these columns ( friend_1, friend_2 )
>
>
>


How are Dataframes partitioned by default when using spark?

2016-09-19 Thread Long, Xindian
How are Dataframes/Datasets/RDD  partitioned by default when using spark? 
assuming the Dataframe/Datasets/RDD  is the result of a query like that:

select col1, col2, col3 from table3 where col3 > xxx

I noticed that for HBase, a partitioner partitions the rowkeys based on region 
splits,  can Phoenix do this as well?

I also read that if I use spark with the Phoenix jdbc interface "it's only able 
to parallelize queries by partioning on a numeric column. It also requires a 
known lower bound, upper bound and partition count in order to create split 
queries."

Question 1,  If I specify an option like this, is the partitioning based on 
segmenting the range evenly, i.e. each partition gets a rowkey in ranges like: 
upperlimit-lowerlmit)/partitionCount ?

Question 2, if I do not specify any range, or the row key is not a numeric 
column, how is the result partitioned using jdbc?


If I use the spark-phoenix  plug in, it is mentioned that it is able to 
leverage the underlying splits provided by Phoenix?
Are there any example scenarios  of that? e.g. can it partition the resulted 
Dataframe based on regions in the underling HBase table, so that spark can take 
advantage the locality of the data?

Thanks

Xindian


Re: Full text query in Phoenix

2016-09-19 Thread Jean-Marc Spaggiari
HBase + Lily Indexer + SOLR will do that very well. As James said, Phoenix
might not help with the full time. Google for that and you will find many
pointers for web articules or even books.

JMS

2016-09-19 9:05 GMT-04:00 Cheyenne Forbes :

> Hi James,
>
> Thanks a lot, I found a link showing how to integrate hbase with lucene
> https://itpeernetwork.intel.com/idh-hbase-lucene-integration/
>


Re: Using COUNT() with columns that don't use COUNT() when the table is join fails

2016-09-19 Thread Michael McAllister
This is really an ANSI SQL question. If you use an aggregate function, then you 
need to specify what columns to group by. Any columns not being referenced in 
the aggregate function(s) need to be in the GROUP BY statement.

Michael McAllister
Staff Data Warehouse Engineer | Decision Systems
mmcallis...@homeaway.com | C: 512.423.7447 | 
skype: michael.mcallister.ha | webex: 
https://h.a/mikewebex
[cid:image001.png@01D21273.F8F1C960]
This electronic communication (including any attachment) is confidential.  If 
you are not an intended recipient of this communication, please be advised that 
any disclosure, dissemination, distribution, copying or other use of this 
communication or any attachment is strictly prohibited.  If you have received 
this communication in error, please notify the sender immediately by reply 
e-mail and promptly destroy all electronic and printed copies of this 
communication and any attachment.

From: Cheyenne Forbes 
Reply-To: "user@phoenix.apache.org" 
Date: Monday, September 19, 2016 at 10:50 AM
To: "user@phoenix.apache.org" 
Subject: Re: Using COUNT() with columns that don't use COUNT() when the table 
is join fails

I was wondering because it seems extra wordy


Re: Combining an RVC query and a filter on a datatype smaller than 8 bytes causes an Illegal Data Exception

2016-09-19 Thread Samarth Jain
Kumar,

Can you try with the 4.8 release?



On Mon, Sep 19, 2016 at 2:54 PM, Kumar Palaniappan <
kpalaniap...@marinsoftware.com> wrote:

>
> Any one had faced this issue?
>
> https://issues.apache.org/jira/browse/PHOENIX-3297
>
> And this one gives no rows
>
> SELECT * FROM TEST.RVC_TEST WHERE (COLONE, COLTWO) IN (1,2) AND COLTHREE
> =3 AND COLFOUR=4;
>
>
>
>


Re: Combining an RVC query and a filter on a datatype smaller than 8 bytes causes an Illegal Data Exception

2016-09-19 Thread Kumar Palaniappan
The problem is when we have just 1 param in the rvc it works.

but this one , for 2+

SELECT * FROM TEST.RVC_TEST WHERE (COLONE, COLTWO) IN ((1,2),(1,2)) AND
COLTHREE=3;

blows up.


On Mon, Sep 19, 2016 at 3:58 PM, Kumar Palaniappan <
kpalaniap...@marinsoftware.com> wrote:

> No, I didnt.
>
> But wrapping up with the parenthesis, it worked.
>
> SELECT * FROM TEST.RVC_TEST WHERE (COLONE, COLTWO) IN ((1,2)) AND
> COLTHREE=3;
>
> SELECT * FROM TEST.RVC_TEST WHERE ((COLONE, COLTWO) IN ((1,2)) AND
> (COLFOUR=4));
>
> On Mon, Sep 19, 2016 at 2:56 PM, Samarth Jain  wrote:
>
>> Kumar,
>>
>> Can you try with the 4.8 release?
>>
>>
>>
>> On Mon, Sep 19, 2016 at 2:54 PM, Kumar Palaniappan <
>> kpalaniap...@marinsoftware.com> wrote:
>>
>>>
>>> Any one had faced this issue?
>>>
>>> https://issues.apache.org/jira/browse/PHOENIX-3297
>>>
>>> And this one gives no rows
>>>
>>> SELECT * FROM TEST.RVC_TEST WHERE (COLONE, COLTWO) IN (1,2) AND COLTHREE
>>> =3 AND COLFOUR=4;
>>>
>>>
>>>
>>>
>>
>


Re: Using COUNT() with columns that don't use COUNT() when the table is join fails

2016-09-19 Thread Maryann Xue
Thank you very much for your answer, Michael! Yes, what Cheyenne tried to
use was simply not the right grammar.


Thanks,
Maryann

On Mon, Sep 19, 2016 at 10:47 AM, Michael McAllister <
mmcallis...@homeaway.com> wrote:

> This is really an ANSI SQL question. If you use an aggregate function,
> then you need to specify what columns to group by. Any columns not being
> referenced in the aggregate function(s) need to be in the GROUP BY
> statement.
>
>
>
> Michael McAllister
>
> Staff Data Warehouse Engineer | Decision Systems
>
> mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha
>  | webex: https://h.a/mikewebex
>
> This electronic communication (including any attachment) is confidential.
> If you are not an intended recipient of this communication, please be
> advised that any disclosure, dissemination, distribution, copying or other
> use of this communication or any attachment is strictly prohibited.  If you
> have received this communication in error, please notify the sender
> immediately by reply e-mail and promptly destroy all electronic and printed
> copies of this communication and any attachment.
>
>
>
> *From: *Cheyenne Forbes 
> *Reply-To: *"user@phoenix.apache.org" 
> *Date: *Monday, September 19, 2016 at 10:50 AM
> *To: *"user@phoenix.apache.org" 
> *Subject: *Re: Using COUNT() with columns that don't use COUNT() when the
> table is join fails
>
>
>
> I was wondering because it seems extra wordy
>


Re: Combining an RVC query and a filter on a datatype smaller than 8 bytes causes an Illegal Data Exception

2016-09-19 Thread Kumar Palaniappan
No, I didnt.

But wrapping up with the parenthesis, it worked.

SELECT * FROM TEST.RVC_TEST WHERE (COLONE, COLTWO) IN ((1,2)) AND
COLTHREE=3;

SELECT * FROM TEST.RVC_TEST WHERE ((COLONE, COLTWO) IN ((1,2)) AND
(COLFOUR=4));

On Mon, Sep 19, 2016 at 2:56 PM, Samarth Jain  wrote:

> Kumar,
>
> Can you try with the 4.8 release?
>
>
>
> On Mon, Sep 19, 2016 at 2:54 PM, Kumar Palaniappan <
> kpalaniap...@marinsoftware.com> wrote:
>
>>
>> Any one had faced this issue?
>>
>> https://issues.apache.org/jira/browse/PHOENIX-3297
>>
>> And this one gives no rows
>>
>> SELECT * FROM TEST.RVC_TEST WHERE (COLONE, COLTWO) IN (1,2) AND COLTHREE
>> =3 AND COLFOUR=4;
>>
>>
>>
>>
>


Re: Phoenix + Spark + JDBC + Kerberos?

2016-09-19 Thread Jean-Marc Spaggiari
Thanks for the pointer to PHOENIX-3189 Josh. I don't think we are facing
that.

We will try to activate the debug mode on Kerberos and retry. Good idea!

I will keep this thread updated if we find something...

JMS

2016-09-15 17:39 GMT-04:00 Josh Elser :

> Cool, thanks for the info, JM. Thinking out loud..
>
> * Could be missing/inaccurate /etc/krb5.conf on the nodes running spark
> tasks
> * Could try setting the Java system property sun.security.krb5.debug=true
> in the Spark executors
> * Could try to set org.apache.hadoop.security=DEBUG in log4j config
>
> Hard to guess at the real issue without knowing more :). Any more context
> you can share, I'd be happy to try to help.
>
> (ps. obligatory warning about PHOENIX-3189 if you're using 4.8.0)
>
> Jean-Marc Spaggiari wrote:
>
>> Using the keytab in the JDBC URL. That the way we use locally and we
>> also tried to run command line applications directly from the worker
>> nodes and it works, But inside the Spark Executor it doesn't...
>>
>> 2016-09-15 13:07 GMT-04:00 Josh Elser > >:
>>
>> How do you expect JDBC on Spark Kerberos authentication to work? Are
>> you using the principal+keytab options in the Phoenix JDBC URL or is
>> Spark itself obtaining a ticket for you (via some "magic")?
>>
>>
>> Jean-Marc Spaggiari wrote:
>>
>> Hi,
>>
>> I tried to build a small app all under Kerberos.
>>
>> JDBC to Phoenix works
>> Client to HBase works
>> Client (puts) on Spark to HBase works.
>> But JDBC on Spark to HBase fails with a message like
>> "GSSException: No
>> valid credentials provided (Mechanism level: Failed to
>> find any Kerberos tgt)]"
>>
>> Keytab is accessible on all the nodes.
>>
>> Keytab belongs to the user running the job, and executors are
>> running
>> under that user name. So this is fine.
>>
>> Any idea of that this might be?
>>
>> Thanks,
>>
>> JMS
>>
>>
>>