Re: [ANNOUNCE] Apache Phoenix 4.8.1 is available for download

2016-09-29 Thread James Taylor
Apache Phoenix enables OLTP and operational analytics for Hadoop through
SQL support using HBase as it's backing store and integrates with other
projects in the ecosystem such as Spark, Hive, Pig, Flume, and MapReduce.

On Tue, Sep 27, 2016 at 10:27 PM,  wrote:

> The Phoenix Team is pleased to announce the immediate release of Apache
> Phoenix 4.8.1.
> Download it from your favorite Apache mirror [1].
>
> Apache Phoenix 4.8.1 a bug fix release for the Phoenix 4.8 release line,
> compatible with Apache HBase 0.98, 1.0, 1.1 & 1.2.
>
> This release fixes the following 43 issues:
> [PHOENIX-1367] - VIEW derived from another VIEW doesn't use parent
> VIEW indexes
> [PHOENIX-3195] - Slight safety improvement for using
> DistinctPrefixFilter
> [PHOENIX-3228] - Index tables should not be configured with a
> custom/smaller MAX_FILESIZE
> [PHOENIX-930] - duplicated columns cause query exception and drop
> table exception
> [PHOENIX-1647] - Correctly return that Phoenix supports schema name
> references in DatabaseMetaData
> [PHOENIX-2336] - Queries with small case column-names return empty
> result-set when working with Spark Datasource Plugin
> [PHOENIX-2474] - Cannot round to a negative precision (to the left of
> the decimal)
> [PHOENIX-2641] - Implicit wildcard in LIKE predicate search pattern
> [PHOENIX-2645] - Wildcard characters do not match newline characters
> [PHOENIX-2853] - Delete Immutable rows from View does not work if
> immutable index(secondary index) exists
> [PHOENIX-2944] - DATE Comparison Broken
> [PHOENIX-2946] - Projected comparison between date and timestamp
> columns always returns true
> [PHOENIX-2995] - Write performance severely degrades with large number
> of views
> [PHOENIX-3046] - NOT LIKE with wildcard unexpectedly returns results
> [PHOENIX-3054] - Counting zero null rows returns an empty result set
> [PHOENIX-3072] - Deadlock on region opening with secondary index
> recovery
> [PHOENIX-3148] - Reduce size of PTable so that more tables can be
> cached in the metada cache.
> [PHOENIX-3162] - TableNotFoundException might be thrown when an index
> dropped while upserting.
> [PHOENIX-3164] - PhoenixConnection leak in PQS with security enabled
> [PHOENIX-3170] - Remove the futuretask from the list if
> StaleRegionBoundaryCacheException is thrown while initializing the
> scanners
> [PHOENIX-3175] - Unnecessary UGI proxy user impersonation check
> [PHOENIX-3185] - Error: ERROR 514 (42892): A duplicate column name was
> detected in the object definition or ALTER TABLE statement.
> columnName=TEST_TABLE.C1 (state=42892,code=514)
> [PHOENIX-3189] - HBase/ZooKeeper connection leaks when providing
> principal/keytab in JDBC url
> [PHOENIX-3203] - Tenant cache lookup in Global Cache fails in certain
> conditions
> [PHOENIX-3207] - Fix compilation failure on 4.8-HBase-1.2,
> 4.8-HBase-1.1 and 4.8-HBase-1.0 branches after PHOENIX-3148
> [PHOENIX-3210] - Exception trying to cast Double to BigDecimal in
> UpsertCompiler
> [PHOENIX-3223] - Add hadoop classpath to PQS classpath
> [PHOENIX-3230] - Upgrade code running concurrently on different JVMs
> could make clients unusuable
> [PHOENIX-3237] - Automatic rebuild of disabled index will fail if
> indexes of two tables are disabled at the same time
> [PHOENIX-3246] - U+2002 (En Space) not handled as whitespace in grammar
> [PHOENIX-3260] - MetadataRegionObserver.postOpen() can prevent region
> server from shutting down for a long duration
> [PHOENIX-3268] - Upgrade to Tephra 0.9.0
> [PHOENIX-3280] - Automatic attempt to rebuild all disabled index
> [PHOENIX-3291] - Do not throw return value of Throwables#propagate call
> [PHOENIX-3307] - Backward compatibility fails for tables with index
> (4.7.0 client - 4.8.1 server)
> [PHOENIX-3323] - make_rc script fails to build the RC
> [PHOENIX-2785] - Do not store NULLs for immutable tables
> [PHOENIX-3081] - MIsleading exception on async stats update after
> major compaction
> [PHOENIX-3116] - Support incompatible HBase 1.1.5 and HBase 1.2.2
> [PHOENIX-808] - Create snapshot of SYSTEM.CATALOG prior to upgrade and
> restore on any failure
> [PHOENIX-2990] - Ensure documentation on "time/date"
> datatypes/functions acknowledge lack of JDBC compliance
> [PHOENIX-2991] - Add missing documentation for functions
> [PHOENIX-3255] - Increase test coverage for TIMESTAMP
>
> See also the full release notes [2].
>
> Yours,
> The Apache Phoenix Team
>
> [1] http://www.apache.org/dyn/closer.lua/phoenix/
> [2] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12315120=12337964
>
>


Re: Using CsvBulkInsert With compressed Hive data

2016-09-29 Thread Gabriel Reid
Hi Zack,

Am I correct in understanding the the files are under a structure like
x/.deflate/csv_file.csv ?

In that case, I believe everything under the .deflate directory will
simply be ignored, as directories whose name start with a period are
considered "hidden" files.

However, assuming the data under those directories is compressed using
a compression codec supported on your cluster (e.g. gz, snappy, etc),
there shouldn't be a problem using them as input for the CSV import.
In other words, the compression probably isn't an issue, but the
directory naming probably is.

- Gabriel

On Thu, Sep 29, 2016 at 7:14 PM, Riesland, Zack
 wrote:
> For a very long time, we’ve had a workflow that looks like this:
>
>
>
> Export data from a compressed, orc hive table to another hive table that is
> “external stored as text file”. No compression specified.
>
>
>
> Then, we point to the folder “x” behind that new table and use CsvBulkInsert
> to get data to Hbase.
>
>
>
> Today, I noticed that the data has not been getting into HBase since late
> August.
>
>
>
> After some clicking around, it looks like this is happening because we have
> hive.exec.compress.output set to true, so the data in folder “x” is
> compressed in “.deflate” folders.
>
>
>
> However, it looks like someone changed this setting to true 4 months ago.
>
>
>
> So we should either be missing 4 months of data, or this should work.
>
>
>
> Thus my question: does CSV bulk insert work with compressed output like
> this?
>
>
>
>


Re: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-29 Thread Gabriel Reid
Hi Ravi,

I see in your output that the final upload of created HFiles is failing due
to the number of HFiles created per region. I also just noticed that you're
supplying the hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily
config parameter.

Could you post the exact, complete command that you're using to run this
import.

Also, be aware that overriding the max hfiles per region setting is
probably not the best way to get around this -- the fact that you've got so
many HFiles per region probably indicates that you should have more
regions. See this discussion in an earlier thread[1] for more info.

- Gabriel

1.
https://lists.apache.org/list.html?user@phoenix.apache.org:lte=3M:CsvBulkLoadTool%20with%20%7E75GB%20file

On Thu, Sep 29, 2016 at 5:16 PM, Ravi Kumar Bommada 
wrote:

> Hi Gabriel,
>
>
>
> Please find the logs attached.
>
>
>
> R’s
>
> Ravi Kumar B
>
>
>
> *From:* Gabriel Reid [mailto:gabriel.r...@gmail.com]
> *Sent:* Wednesday, September 28, 2016 5:51 PM
> *To:* user@phoenix.apache.org
> *Subject:* Re: Loading via MapReduce, Not Moving HFiles to HBase
>
>
>
> Hi Ravi,
>
>
>
> It looks like those log file entries you posted are from a mapreduce task.
> Could you post the output of the command that you're using to start the
> actual job (i.e. console output of "hadoop jar ...").
>
>
>
> - Gabriel
>
>
>
> On Wed, Sep 28, 2016 at 1:49 PM, Ravi Kumar Bommada <
> braviku...@juniper.net> wrote:
>
> Hi All,
>
>
>
> I’m trying to load data via phoenix mapreduce referring to below screen:
>
>
>
>
>
> HFiles are getting created, each HFile is of size 300MB and 176 such
> HFiles are there, but after that files are not moving to HBase. i.e when
> I’m querying HBase I’m not able to see data.According to the logs below
> data commit is successful.
>
>
>
> Please suggest, if I’m missing any configuration.
>
>
>
> Provided:
>
>
>
> Using property: -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.
> perFamily=1024
>
>
>
> Last Few Logs:
>
> 2016-09-27 07:27:35,845 INFO [main] org.apache.hadoop.io.compress.CodecPool:
> Got brand-new decompressor [.snappy]
>
> 2016-09-27 07:27:35,846 INFO [main] org.apache.hadoop.io.compress.CodecPool:
> Got brand-new decompressor [.snappy]
>
> 2016-09-27 07:27:35,846 INFO [main] org.apache.hadoop.io.compress.CodecPool:
> Got brand-new decompressor [.snappy]
>
> 2016-09-27 07:27:35,846 INFO [main] org.apache.hadoop.mapred.Merger:
> Merging 64 intermediate segments out of a total of 127
>
> 2016-09-27 07:28:21,238 INFO [main] org.apache.hadoop.mapred.Merger: Down
> to the last merge-pass, with 64 segments left of total size: -40111574372
> bytes
>
> 2016-09-27 07:30:24,933 INFO [main] org.apache.hadoop.mapred.Merger:
> Merging 179 sorted segments
>
> 2016-09-27 07:30:24,965 INFO [main] org.apache.hadoop.mapred.Merger: Down
> to the last merge-pass, with 0 segments left of total size: 4736 bytes
>
> 2016-09-27 07:30:24,967 INFO [main] org.apache.hadoop.mapred.Merger:
> Merging 179 sorted segments
>
> 2016-09-27 07:30:24,999 INFO [main] org.apache.hadoop.mapred.Merger: Down
> to the last merge-pass, with 0 segments left of total size: 4736 bytes
>
> 2016-09-27 07:30:25,000 INFO [main] org.apache.hadoop.mapred.Merger:
> Merging 179 sorted segments
>
> 2016-09-27 07:30:25,033 INFO [main] org.apache.hadoop.mapred.Merger: Down
> to the last merge-pass, with 0 segments left of total size: 4736 bytes
>
> 2016-09-27 07:30:25,035 INFO [main] org.apache.hadoop.mapred.Merger:
> Merging 179 sorted segments
>
> 2016-09-27 07:30:25,068 INFO [main] org.apache.hadoop.mapred.Merger: Down
> to the last merge-pass, with 0 segments left of total size: 4736 bytes
>
> 2016-09-27 07:30:25,723 INFO [main] org.apache.hadoop.mapred.Task:
> Task:attempt_1467713708066_29809_m_16_0 is done. And is in the
> process of committing
>
> 2016-09-27 07:30:25,788 INFO [main] org.apache.hadoop.mapred.Task: Task
> 'attempt_1467713708066_29809_m_16_0' done.
>
>
>
>
>
> Regard’s
>
>
>
> Ravi Kumar B
>
> Mob: +91 9591144511
>
>
>
>
>
>
>
>
>


RE: How are Dataframes partitioned by default when using spark?

2016-09-29 Thread Long, Xindian
Hi, Josh:

Thanks for the reply. I still have some questions/comments

The phoenix-spark integration inherits the underlying splits provided by 
Phoenix, which is a function of the HBase regions, salting and other aspects 
determined by the Phoenix Query Planner.

XD: Is there any documentation on what this function actually is ?

Re: #1, as I understand the Spark JDBC connector, it evenly segments the range, 
although it will only work on a numeric column, not a compound row key.

Re: #2, again, as I understand Spark JDBC, I don't believe that's an option, or 
perhaps it will default to only providing 1 partition, i.e, one very large 
query.

Re: data-locality, the underlying Phoenix Hadoop Input Format isn't yet 
node-aware. There are some data locality advantages gained by co-locating the 
Spark executors to the RegionServers, but it could be improved. It's worth 
filing a JIRA enhancement ticket for that.

XD: A JIRA enhancement will be great.

Thanks

Xindian

On Mon, Sep 19, 2016 at 12:48 PM, Long, Xindian 
> wrote:
How are Dataframes/Datasets/RDD  partitioned by default when using spark? 
assuming the Dataframe/Datasets/RDD  is the result of a query like that:

select col1, col2, col3 from table3 where col3 > xxx

I noticed that for HBase, a partitioner partitions the rowkeys based on region 
splits,  can Phoenix do this as well?

I also read that if I use spark with the Phoenix jdbc interface “it’s only able 
to parallelize queries by partioning on a numeric column. It also requires a 
known lower bound, upper bound and partition count in order to create split 
queries.”

Question 1,  If I specify an option like this, is the partitioning based on 
segmenting the range evenly, i.e. each partition gets a rowkey in ranges like: 
upperlimit-lowerlmit)/partitionCount ?

Question 2, if I do not specify any range, or the row key is not a numeric 
column, how is the result partitioned using jdbc?


If I use the spark-phoenix  plug in, it is mentioned that it is able to 
leverage the underlying splits provided by Phoenix?
Are there any example scenarios  of that? e.g. can it partition the resulted 
Dataframe based on regions in the underling HBase table, so that spark can take 
advantage the locality of the data?

Thanks

Xindian



Re: bulk-delete spark phoenix

2016-09-29 Thread dalin.qin
Hi Fabio,

I have the very same requiremnt in my enviroment . My way is to  collect
all the data you need to delete and save them into one hbase table and then
issue the delete statment  . following is the sample code , hope this can
help.

conn=sc._jvm.DriverManager.getConnection("jdbc:phoenix:namenode:2181:/hbase-unsecure")
stmt = conn.createStatement()
stmt.execute("delete from t_table1 where exists (select 1 from
t_table1_for_delete where t_table1_for_delete .ENTITYID=t_table1 .ENTITYID
and t_table1_for_delete .CLIENTMAC=t_table1 .CLIENTMAC
and t_table1_for_delete .ETIME=t_table1 .ETIME)")
#CLIENTMAC , ENTITYID and ETIME is the primary key .

Dalin

On Thu, Sep 29, 2016 at 9:30 AM, fabio ferrante 
wrote:

> Hi Josh,
> using a regular DELETE is not a bulk delete. Bulk operation use jdbc
> statement batch, collecting up to 1000 statements into a single execution
> reducing the amount of communication overhead and improving performance.
> What i really want to do is add a deleteFromPhoenix() method into
> ProductRDD to simplify the scenario where an RDD doesn't contains new data
> to save into but old data to delete from Phoenix.
>
> FF.
>
> --
> *Da:* Josh Mahonin [mailto:jmaho...@gmail.com]
> *Inviato:* mercoledì 28 settembre 2016 20:29
> *A:* user@phoenix.apache.org
> *Oggetto:* Re: bulk-delete spark phoenix
>
> Hi Fabio,
>
> You could probably just execute a regular DELETE query from a JDBC call,
> which is generally safe to do either from the Spark driver or within an
> executor. As long as auto-commit is enabled, it's an entirely server side
> operation: https://phoenix.apache.org/language/#delete
>
> Josh
>
> On Wed, Sep 28, 2016 at 2:13 PM, fabio ferrante 
> wrote:
>
>> Hi,
>>
>> I would like to perform a bulk delete to HBase using Apache Phoenix from
>> Spark. Using Phoenix-Spark plugin i can successfully perform a bulk load
>> using saveToPhoenix method from PhoenixRDD but how i can perform a bulk
>> delete? There isn't a deleteFromPhoenix method in PhoenixRDD. Is that
>> correct? Implement such method is a trivial task?
>>
>> Thanks in advance,
>>  Fabio.
>>
>
>


RE: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-29 Thread Ravi Kumar Bommada
Hi Gabriel,

Please find the logs attached.

R’s
Ravi Kumar B

From: Gabriel Reid [mailto:gabriel.r...@gmail.com]
Sent: Wednesday, September 28, 2016 5:51 PM
To: user@phoenix.apache.org
Subject: Re: Loading via MapReduce, Not Moving HFiles to HBase

Hi Ravi,

It looks like those log file entries you posted are from a mapreduce task. 
Could you post the output of the command that you're using to start the actual 
job (i.e. console output of "hadoop jar ...").

- Gabriel

On Wed, Sep 28, 2016 at 1:49 PM, Ravi Kumar Bommada 
> wrote:
Hi All,

I’m trying to load data via phoenix mapreduce referring to below screen:

[cid:image001.jpg@01D21A92.5D1636D0]

HFiles are getting created, each HFile is of size 300MB and 176 such HFiles are 
there, but after that files are not moving to HBase. i.e when I’m querying 
HBase I’m not able to see data.According to the logs below data commit is 
successful.

Please suggest, if I’m missing any configuration.

Provided:

Using property: -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=1024

Last Few Logs:
2016-09-27 07:27:35,845 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.snappy]
2016-09-27 07:27:35,846 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.snappy]
2016-09-27 07:27:35,846 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.snappy]
2016-09-27 07:27:35,846 INFO [main] org.apache.hadoop.mapred.Merger: Merging 64 
intermediate segments out of a total of 127
2016-09-27 07:28:21,238 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 64 segments left of total size: -40111574372 bytes
2016-09-27 07:30:24,933 INFO [main] org.apache.hadoop.mapred.Merger: Merging 
179 sorted segments
2016-09-27 07:30:24,965 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 0 segments left of total size: 4736 bytes
2016-09-27 07:30:24,967 INFO [main] org.apache.hadoop.mapred.Merger: Merging 
179 sorted segments
2016-09-27 07:30:24,999 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 0 segments left of total size: 4736 bytes
2016-09-27 07:30:25,000 INFO [main] org.apache.hadoop.mapred.Merger: Merging 
179 sorted segments
2016-09-27 07:30:25,033 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 0 segments left of total size: 4736 bytes
2016-09-27 07:30:25,035 INFO [main] org.apache.hadoop.mapred.Merger: Merging 
179 sorted segments
2016-09-27 07:30:25,068 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 0 segments left of total size: 4736 bytes
2016-09-27 07:30:25,723 INFO [main] org.apache.hadoop.mapred.Task: 
Task:attempt_1467713708066_29809_m_16_0 is done. And is in the process of 
committing
2016-09-27 07:30:25,788 INFO [main] org.apache.hadoop.mapred.Task: Task 
'attempt_1467713708066_29809_m_16_0' done.


Regard’s

Ravi Kumar B
Mob: +91 9591144511






logs.docx
Description: logs.docx


R: bulk-delete spark phoenix

2016-09-29 Thread fabio ferrante
Hi Josh,
using a regular DELETE is not a bulk delete. Bulk operation use jdbc
statement batch, collecting up to 1000 statements into a single execution
reducing the amount of communication overhead and improving performance.
What i really want to do is add a deleteFromPhoenix() method into ProductRDD
to simplify the scenario where an RDD doesn't contains new data to save into
but old data to delete from Phoenix.
 
FF.
 
  _  

Da: Josh Mahonin [mailto:jmaho...@gmail.com] 
Inviato: mercoledì 28 settembre 2016 20:29
A: user@phoenix.apache.org
Oggetto: Re: bulk-delete spark phoenix


Hi Fabio, 

You could probably just execute a regular DELETE query from a JDBC call,
which is generally safe to do either from the Spark driver or within an
executor. As long as auto-commit is enabled, it's an entirely server side
operation: https://phoenix.apache.org/language/#delete

Josh

On Wed, Sep 28, 2016 at 2:13 PM, fabio ferrante 
wrote:



Hi,
 
I would like to perform a bulk delete to HBase using Apache Phoenix from
Spark. Using Phoenix-Spark plugin i can successfully perform a bulk load
using saveToPhoenix method from PhoenixRDD but how i can perform a bulk
delete? There isn't a deleteFromPhoenix method in PhoenixRDD. Is that
correct? Implement such method is a trivial task?
 
Thanks in advance,
 Fabio.