Re: Spark 2.1.1 (Scala 2.11.8) write to Phoenix 4.7 (HBase 1.1.2)

2018-01-30 Thread Margusja
Also I see that Logging is moved to internal/Logging. But is there package for 
my environment I can use?

Margus


> On 30 Jan 2018, at 17:00, Margusja  wrote:
> 
> Hi
> 
> Followed page (https://phoenix.apache.org/phoenix_spark.html 
>  
>  >) and trying to save to 
> phoenix.
> 
> Using spark-1.6.3 it is successful but using spark-2.1.1 it is not.
> First error I am getting using spark-2.1.1 is that:
> 
> Error:scalac: missing or invalid dependency detected while loading class file 
> 'ProductRDDFunctions.class'.
> Could not access type Logging in package org.apache.spark,
> because it (or its dependencies) are missing. Check your build definition for
> missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see 
> the problematic classpath.)
> A full rebuild may help if 'ProductRDDFunctions.class' was compiled against 
> an incompatible version of org.apache.spark.
> 
> I can see that Logging is removed after 1.6.3 and does not exist in 2.1.1.
> 
> What are my options?
> 
> Br
> Margus



Spark 2.1.1 (Scala 2.11.8) write to Phoenix 4.7 (HBase 1.1.2)

2018-01-30 Thread Margusja
Hi

Followed page (https://phoenix.apache.org/phoenix_spark.html 
 
>) and trying to save to phoenix.

Using spark-1.6.3 it is successful but using spark-2.1.1 it is not.
First error I am getting using spark-2.1.1 is that:

Error:scalac: missing or invalid dependency detected while loading class file 
'ProductRDDFunctions.class'.
Could not access type Logging in package org.apache.spark,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the 
problematic classpath.)
A full rebuild may help if 'ProductRDDFunctions.class' was compiled against an 
incompatible version of org.apache.spark.

I can see that Logging is removed after 1.6.3 and does not exist in 2.1.1.

What are my options?

Br
Margus


Spark 2.1.1 (Scala 2.11.8) write to Phoenix 4.7 (HBase 1.1.2)

2018-01-30 Thread Margusja
Hi

Followed page (https://phoenix.apache.org/phoenix_spark.html 
) and trying to save to phoenix.

Using spark-1.6.3 it is successful but using spark-2.1.1 it is not. 
First error I am getting using spark-2.1.1 is that:

Error:scalac: missing or invalid dependency detected while loading class file 
'ProductRDDFunctions.class'.
Could not access type Logging in package org.apache.spark,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the 
problematic classpath.)
A full rebuild may help if 'ProductRDDFunctions.class' was compiled against an 
incompatible version of org.apache.spark.

I can see that Logging is removed after 1.6.3 and does not exist in 2.1.1.

What are my options?

Br
Margus

Re: Is first query to a table region way slower?

2018-01-30 Thread Pedro Boado
Same behaviour after truncating the table and after updating statistics
over the table.

Does size explain the higher lag? In the later example the slow table is
100regions and 2billion records. As I'm just querying first value it
shouldn't be affected, right?

On 30 Jan 2018 00:57, "Mujtaba Chohan"  wrote:

> Just to remove one variable, can you repeat the same test after truncating
> Phoenix Stats table? (either truncate SYSTEM.STATS from HBase shell or use
> sql: delete from SYSTEM.STATS)
>
> On Mon, Jan 29, 2018 at 4:36 PM, Pedro Boado 
> wrote:
>
>> Yes there is a rs.next().
>>
>> In fact if I run this SELECT * FROM table LIMIT 1 in a loop for four
>> different tables in the same cluster I get relatively consistent response
>> times across iterations, but same pattern if I execute the code over and
>> over again. So basically first call per table is way slower.
>>
>> And for some reason call to TABLE4 is way slower than the others ( only
>> difference is this table being quite big compared to the others ) .
>>
>> By hooking a jmeter to the vm I see new threads being created and
>> destroyed in both hconnection and phoenix threadpools per loop ( I am not
>> pooling connections ) , and quite a lot of network IO in the IPC Network
>> thread to one of the RS during the 4 seconds the first query takes (
>> basically this thread is doing Net IO during 60-70% of the 4200 msec ) .
>>
>>
>>  Starting healthcheck '1'
>>  Checking table TABLE1 state took 874 msec.
>>  Checking table TABLE2 state took 471 msec.
>>  Checking table TABLE3 state took 844 msec.
>>  Checking table TABLE4 state took 4234 msec.
>>  Starting healthcheck '2'
>>  Checking table TABLE1 state took 103 msec.
>>  Checking table TABLE2 state took 98 msec.
>>  Checking table TABLE3 state took 78 msec.
>>  Checking table TABLE4 state took 148 msec.
>>  Starting healthcheck '3'
>>  Checking table TABLE1 state took 351 msec.
>>  Checking table TABLE2 state took 108 msec.
>>  Checking table TABLE3 state took 84 msec.
>>  Checking table TABLE4 state took 137 msec.
>>  Starting healthcheck '4'
>>  Checking table TABLE1 state took 102 msec.
>>  Checking table TABLE2 state took 94 msec.
>>  Checking table TABLE3 state took 77 msec.
>>  Checking table TABLE4 state took 138 msec.
>>  Starting healthcheck '5'
>>  Checking table TABLE1 state took 103 msec.
>>  Checking table TABLE2 state took 93 msec.
>>  Checking table TABLE3 state took 77 msec.
>>  Checking table TABLE4 state took 142 msec.
>> ...
>>
>>
>> Any other idea maybe?
>>
>>
>>
>>
>>
>> On 29 Jan 2018 01:55, "James Taylor"  wrote:
>>
>>> Did you do an rs.next() on the first query? Sounds related to
>>> HConnection establishment. Also, least expensive query is SELECT 1 FROM T
>>> LIIMIT 1.
>>>
>>> Thanks,
>>> James
>>>
>>> On Sun, Jan 28, 2018 at 5:39 PM Pedro Boado 
>>> wrote:
>>>
 Hi all,

 I'm running into issues with a java springboot app that ends up
 querying a Phoenix cluster (from out of the cluster) through the non-thin
 client.

 Basically this application has a high latency - around 2 to 4 seconds -
 for the first query per  primary key to each region of a table with 180M
 records ( and 10 regions ) . Following calls - for different keys within
 the same region - have an average response time of ~60-80ms. No secondary
 indexes involved. No writes to the table at all during these queries.

 I don't think it's related to HConnection establishing as it was
 already stablished before the query ran ( a SELECT * FROM table LIMIT 1 is
 executed as soon as the datasource is created )

 I've been doing some quick profiling and almost all the time is spent
 inside the actual jdbc call.

 So here's the question: in your experience, is this normal behaviour -
 so I have to workaround the problem from application code warming up
 connections during app startup -  or is it something unusual? Any
 experience reducing first query latencies?

 Thanks!


>